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Notation 


Lower  case  letters  denote  scalars,  underscored  lower  case  letters  denote 
vectors,  and  capital  letters  generally  denote  matrices. 


A  = 


b  = 
C  = 
ci  = 
C2  = 
det()  = 
dp  = 
E{}  = 
H  = 
I  = 
i  = 
K  = 
L  = 
In  []  = 
min(ti,t2)  = 
N(|!,G2)  = 
Pr(*)  = 
p(x;o)  = 
p(z1,...,zN;a)  = 


state  transition  matrix 

element  of  the  Hessian  of  negative  log-likelihood  function 
or  Fisher  Information  approximation  to  the  Hessian 
system  deterministic  input  matrix 
element  of  gradient  of  negative  log-likelihood  function 
signal  bias  or  random  walk  standard  deviation 
system  output  matrix 

inverse  of  exponentially  correlated  noise  time  constant 
exponentially  correlated  noise  scaling  parameter 
determinant  operator 
Brownian  motion  differential 
expected  value  operator 

fractional  Brownian  motion  dimension  parameter 
Fisher  Information  matrix 
when  not  used  as  an  index 
Kalman  filter  gain  matrix 
system  plant  noise  input  matrix 
natural  logarithm  operator 
minimum  value  of  ti  and  t2 
normally  distributed  with  mean  p  and  variance  a2 
probability  of  the  event  • 

probability  density  of  x  as  a  function  of  the  parameters  £ 
probability  density  function  of  the  random  variables 
Z\,  ...,  Z[sj  as  a  function  of  the  parameters  £ 
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p(x  I  £,’(2)  =  conditional  probability  density  of  x  given  the  random 
variable  y  as  a  function  of  the  parameters  & 
p(z(tk)  I  zk_1)  =  probability  density  of  z(tk)  given  zCtk-i),  z(tk-2),  •••/ z(tQ) 
r  =  Kalman  filter  pre-update  residual 

5  =  covariance  matrix  of  a  multivariate  normal  distribution 

tr  [  ]  =  trace  operator 

tk  =  time  index 

u  =  deterministic  input  vector 

Vh  =  variance  of  unsealed  fractional  Brownian  motion 
X  =  Fourier  transform  of  x(t) 

x  =  state  vector  or  vector  valued  stochastic  or  deterministic 
process 

$  =  optimal  estimate  of  state  vector 
z  =  measurement  vector  [zi,...,zn1t 

<A  =  vector  of  unknown  system  parameters 
=  parameter  estimate 

6  =  maximum  likelihood  estimate  of  parameters 

P  =  Brownian  motion  process 

pH  =  fractional  Brownian  motion  process 

T()  =  gamma  function 

At  =  discrete  time  increment 

Ay(tk)  =  increment  between  measurements  at  time  tk  and  at  time 
tk-l 

A&  =  parameter  adjustment 

5(t)  =  delta  (impulse)  function 

S(j)  =  discrete  form  of  delta  function 
8jk  =  Kronecker  delta  (1  if  j  =  k,  0  otherwise) 

C,  =  negative  log-likelihood  function  (sometimes  without 
constant) 

0  =  covariance  of  discrete  white  measurement  noise 
Q.  =  white  measurement  noise  vector 

ji  =  mean  of  measurements 

E  =  covariance  of  discrete  white  plant  noise 
£  =  white  plant  noise  vector 

X  =  covariance  matrix 
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a 

= 

standard  deviation  of  a  random  variable 

on 

= 

standard  deviation  of  fractional  Brownian  motion 

^xx 

= 

two-sided  power  spectral  density  of  x 

4>L 

= 

one-sided  power  spectral  density  of  x 

Oxx 

= 

estimate  of  power  spectral  density  function 

XX 

= 

estimate  of  one-sided  power  spectral  density  function 

= 

correlation  function  of  stationary  processes  x  and  y 

= 

estimate  of  correlation  function 

^xyCtk^k+j) 

= 

correlation  function  of  nonstationary  processes  x  and  y 

(0 

= 

probability  space 

* 

= 

convolution  operator 

1  1 

= 

magnitude  operator 

Subscripts 

i  = 

ith  element  of  a  vector 

ij  = 

ijth  element  of  a  matrix 

Superscripts 

T  = 

transpose  operator 

*  _ 

complex  conjugate 

-1  = 

matrix  inverse 

Acronyms 

fBm  = 

fractional  Brownian  motion 

FIMLOF  = 

Full  Information  Maximum  Likelihood  Optimal  Filtering 

PSD  = 

Power  Spectral  Density 

Chapter  1 


Introduction  and  Summary 


1.1  Modeling  Noise  Processes  and  Estimating  Noise  Parameters 

In  this  thesis,  maximum  likelihood  estimation  is  applied  to  estimating 
stochastic  noise  parameters  for  both  Markov  and  non-Markov  noise 
processes.  Dynamic  parameters,  such  as  trend,  were  simultaneously 
estimated  with  and  separated  from  stochastic  parameters.  The  goal  was  to 
develop  methods  of  determining  unknown  parameters  in  models  of  systems 
which  exhibit  power  spectral  densities  proportional  to  fP,  where  -2  <  p  <  0. 

There  are  many  naturally  occurring  systems  with  PSD  proportional  to 
fP.  This  behavior  can  persist  over  a  wide  frequency  range  [44].  The  particular 
value  P  =  -1  is  indicative  of  what  is  called  flicker  or  j  noise.  Since  no  simple 
Markov  noise  process  can  have  this  frequency  domain  characteristic  for  more 
than  a  narrow  frequency  band,  these  systems  are  commonly  approximated 
using  a  combination  of  several  independent  Markov  processes  [21]. 

An  alternate  approach  is  to  model  such  a  noise  process  using  fractional 
Brownian  motion  [26],  which  has  PSD  proportional  to  fP  (-3  <  P  <  -1)  for  all 
frequencies,  where  P  is  determined  by  a  parameter  of  the  process.  The 
increment  process  of  fractional  Brownian  motion,  called  fractional  Brownian 
noise,  has  PSD  proportional  to  fP  (-1  <  P  <  1)  for  all  frequencies. 

Because  the  Markov  approximation  to  fractional  Brownian  motion  can 
be  implemented  in  state  space  form,  this  model  is  commonly  used  in 
applications  where  the  goal  is  to  develop  a  control  law  for  a  system  [21].  Due 


to  its  minimum  parameter  form  for  representing  fP  noise,  the  fractional 
Brownian  motion  model  is  more  frequently  used  in  applications  where  the 
goal  is  to  analyze  a  given  signal,  e.g.  image  processing  [24]. 

Even  though  fractional  Brownian  motion  (fBm)  has  stationary  self¬ 
similar  increments,  they  are  not  independent  and  fBm  is  not  a  Markov 
process.  This  means  that  state  space  models  and  Kalman  filter  estimators 
cannot  be  applied  to  the  parameters  of  the  process. 

A  Kalman  filter  estimator  could  be  applied  to  estimating  the  stochastic 
and  dynamic  parameters  of  a  Markov  process  modeled  with  a  state  dynamic 
system.  The  parameters  would  be  estimated  along  with  the  states  by 
augmenting  the  state  vector  and  using  an  extended  Kalman  filter  (the 
augmented  system  will  be  non-linear,  because  the  dynamic  parameters 
generally  multiply  the  states,  even  if  the  original  system  is  linear  in  the 
states).  This  is  a  somewhat  artificial  approach  which  might  not  converge  [41]. 

As  an  alternate  approach,  maximum  likelihood  system  identification 
can  be  applied  to  estimating  the  stochastic  and  dynamic  parameters  in  a  state 
dynamic  system  in  conjunction  with  using  a  Kalman  filter  to  estimate  the 
states.  The  maximum  likelihood  technique  can  also  be  applied  to  estimating 
the  parameters  in  a  non-Markov  noise  process  such  as  fractional  Brownian 
motion  simultaneously  with  other  dynamic  and  Markov  noise  parameters. 
The  classical  Fisher  maximum  likelihood  approach  has  many  desirable 
properties,  and  is  naturally  applied  to  these  problems.  The  results  of  this 
thesis  show  that  it  can  be  applied  successfully. 

New  in  this  thesis  is  the  use  of  partial  derivatives  in  estimating 
fractional  Brownian  motion  parameters,  rather  than  a  brute  force  search  for  a 
maximum  of  the  likelihood  function.  Also  new  in  this  thesis  is  the 
estimation  of  other  noise  parameters  and  trend  simultaneously  with  the 
fractional  Brownian  motion  parameters.  In  addition,  trend,  white  noise, 
random  walk,  and  exponentially  correlated  noise  parameters  are  estimated 
for  a  Markov  state  dynamic  system  model.  This  thesis  provides  a  concrete 
example  of  this  generally  accepted  procedure. 
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1.2  Background  Material 

1.2.1  Maximum  Likelihood  Estimation 

Chapter  2  discusses  the  properties  of  parameter  estimators  in  general 
and  maximum  likelihood  estimators  in  particular.  For  any  estimator,  the 
Cramer-Rao  lower  bound  applies  with  the  covariance  of  unbiased  parameter 
estimates  being  greater  than  or  equal  to  the  elements  of  the  inverse  of  the 
Fisher  information  matrix  [45].  The  Fisher  information  matrix  is  the  expected 
value  of  the  Hessian  of  second  partial  derivatives  of  the  negative  log- 
likelihood  function,  where  the  likelihood  is  the  probability  density  of  the 
observables  as  functions  of  the  parameters  [45].  Rigorous  calculations  show 
that  the  Fisher  information  matrix  is  also  equal  to  the  expected  value  of  the 
dyadic  product  of  the  gradient  of  the  negative  log-likelihood,  which  only 
involves  first  partial  derivatives  [45]. 

Maximum  likelihood  estimation  seeks  parameter  values  which 
maximize  the  likelihood  function  (or  minimize  the  negative  log  likelihood), 
which  means  that  parameter  values  are  chosen  that  make  it  most  likely  that 
the  observations  that  did  occur  would  have  occurred.  Under  many 
circumstances,  maximum  likelihood  estimates  have  desirable  theoretical 
properties,  such  as  being  asymptotically  consistent,  unbiased,  efficient, 
normally  distributed  about  the  true  parameter  values,  and  attaining  the 
Cramer-Rao  lower  bound  [45]. 

Determination  of  maximum  likelihood  estimates  in  non-linear 
problems  is  done  iteratively  starting  from  a  first  guess  for  the  parameter 
values.  The  adjustments  to  the  parameter  values  at  each  stage  of  the  iteration 
are  determined  by  solving  a  set  of  linear  equations  whose  coefficient  matrix  is 
the  Fisher  information  approximation  to  the  Hessian  of  the  negative  log- 
likelihood  function. 
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1.2.2  Power  Spectral  Density  Analysis 

Chapter  3  discusses  stochastic  processes,  the  autocorrelation  function  of 
a  stochastic  process,  and  the  Power  Spectral  Density  (PSD)  which  is  defined  as 
the  Fourier  transform  of  the  autocorrelation  function. 

The  autocorrelation  is  the  expected  value  of  the  product  of  the 
stochastic  process  values  at  two  different  times.  For  a  stationary  stochastic 
process,  the  autocorrelation  is  only  a  function  of  the  difference  of  the  two 
times.  For  an  ergodic  stationary  stochastic  process,  expected  value  ensemble 
averages  can  be  replaced  by  time  averages  over  any  realization  of  the  process. 
Therefore,  an  estimate  of  the  autocorrelation  function  of  an  ergodic  stochastic 
process  is  the  time  domain  autocorrelation  over  a  finite  interval  of  a 
particular  realization  from  the  process,  divided  by  the  time  interval  [34], 

The  Fourier  transform  of  this  estimate  of  the  autocorrelation  function 
is  an  estimate  of  the  PSD  of  the  process.  By  the  frequency  domain  properties 
of  time  autocorrelation  (Appendix  A),  the  estimate  of  the  PSD  is  then  equal  to 
the  magnitude  squared  of  the  Fourier  transform  of  a  finite  time  span  of  the 
particular  realization  of  the  stochastic  process.  Finally,  this  estimate  of  the 
PSD  can  be  computed  using  the  discrete  fast  Fourier  transform  of  samples 
from  the  realization  [34], 

PSD  analysis  is  employed  as  an  adjunct  to  the  time  domain  estimation 
of  stochastic  noise  parameters.  It  is  checked  that  sample  paths  generated 
using  random  number  generators  and  employing  the  estimated  parameters 
have  the  same  PSD  as  that  of  the  original  data. 


1.2.3  Markov  Noise  Processes 

Chapter  4  discusses  the  Markov  and  Martingale  properties  for 
stochastic  processes,  the  Weiner  random  walk  and  white  noise  processes, 
stochastic  integrals  and  differential  equations,  exponentially  correlated  noise, 
and  ways  of  simulating  noise  processes. 
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The  Markov  property  is  that  the  expected  vaiue  of  the  future  given  the 
present  and  the  past  is  equal  to  the  expected  value  of  the  future  given  the 
present  [18].  This  is  an  expression  of  the  principle  of  causality  from  physics 
for  stochastic  processes.  In-so-far  as  non-Markov  behavior  is  seen  in  nature 
(such  as  fP  noise  over  a  wide  frequency  band  with  -2  <  (3  <  0),  one  could 
suppose  that  it  is  due  to  not  having  enough  states  in  the  model  of  the  natural 
system.  However  it  may  be  inconvenient  to  have  as  many  states  as  would  be 
required  to  practically  model  the  noise. 

A  martingale  is  a  stochastic  process  for  which  the  expected  value  of  the 
future  is  equal  to  the  present  [13].  A  Weiner  random  walk  process  P(t)  is  a 
Gaussian  process  with  stationary  independent  increments.  It  is  a  martingale 
with  E{[p(t2)-p(tj)]2}  =  a2  I  t2-t|  I,  and  it  provides  a  model  of  the  Brownian 
motion  that  a  small  particle  buffeted  by  fluid  molecules  undergoes.  It  is 
almost  surely  continuous  and  non-dirferentiable  [18]. 

White  noise  can  be  considered  as  the  approximate  derivative  of  a 
Weiner  process.  The  0  and  -2  log-log  PSD  slopes  of  white  noise  and  random 
walk  noise  are  derived.  The  -2  and  +2  log-log  PSD  slopes  of  trend  and 
quantization  noise  are  also  derived. 

The  Ito  stochastic  integral  is  defined  relative  to  the  differential  dp  of  a 
Weiner  process.  The  differential  in  terms  of  increments  of  the  process  can  be 
used,  even  though  the  derivative  does  not  exist.  The  independent 
increments  properties  of  the  Weiner  process  are  heavily  used,  so  that,  for 
example,  the  stochastic  integral  could  not  be  defined  relative  to  the 
differential  dpH  of  a  fractional  Brownian  motion  process.  Results  about  the 
integration  of  stochastic  differential  equations  are  stated. 

The  stochastic  differential  equation  for  exponentially  correlated  noise  is 
given,  and  its  PSD  is  computed.  The  log-log  PSD  slope  is  0  at  low  frequencies 
transitioning  to  -2  at  high  frequencies,  so  that  a  number  of  exponentially 
correlated  noise  processes  could  be  used  to  approximate  a  -1  log-log  PSD  slope 
over  a  finite  frequency  band. 

A  technique  is  described  for  simulating  these  various  noise  processes 
using  a  random  number  generator  on  a  computer. 
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1.2.4  Fractional  Brownian  Motion 

Fractional  Brownian  motion  pH(t)  is  defined  in  terms  of  a  certain 
stochastic  integral  relative  to  a  Weiner  process  differential  dp.  The  integrand 
is  in  fact  that  used  in  the  ordinary  calculus  definition  of  fractional  derivatives 
and  integrals.  Thus  fractional  Brownian  motion  can  be  considered  as  the 
fractional  derivative  or  integral  of  ordinary  Brownian  motion,  depending  on 
the  value  of  the  parameter  H  in  the  definition  (0  <  H  <  1).  It  reduces  to 
ordinary  Brownian  motion  when  H  = 

Fractional  Brownian  motion  has  stationary  and  self-similar  but  not 
independent  increments  with  E{[pH(t2)-pH(ti)l2}  =  °h2  I  t2_tl  1 2H-  It  is  unique, 
in  that  any  stochastic  process  with  these  properties  is  fractional  Brownian 
motion  multiplied  by  a  constant. 

The  autocorrelation  function  of  fractional  Brownian  motion  (fBm)  is 
calculated,  from  which  it  is  derived  that  its  log-log  PSD  slope  is  (-1-2H).  This 
is  done  by  showing  that  the  fBm  derivative  process  has  log-log  PSD  slope 
(1-2H). 

Two  techniques  for  generating  simulated  fractional  Brownian  motion 
sample  paths  using  a  random  number  generator  are  presented. 


1.3  Summary  of  Maximum  Likelihood  Fits  to  Real  and  Simulated  Data 
1.3.1  Maximum  Likelihood  System  Identification  for  Markov  Processes 

Chapter  6  discusses  maximum  likelihood  system  identification  for  a 
state  dynamic  system  model,  and  applies  the  technique  to  estimating  trend, 
white  noise,  random  walk,  and  exponentir  lly  correlated  noise  parameters  in 
real  and  simulated  data. 

Maximum  likelihood  system  identification  runs  a  maximum 
likelihood  estimator  on  dynamic  and  stochastic  parameters  and  a  Kalman 
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filter  on  the  states.  This  technique  is  sometimes  called  Full  Information 
Maximum  Likelihood  Optimal  Filtering  (FIMLOF). 

The  general  system  identification  formulas  are  presented  for  a  discrete 
linear  dynamic  system  with  a  vector  observable,  and  then  for  the  specific  case 
discussed  in  this  thesis  of  a  scalar  observable  that  is  the  sum  of  a  trend,  white 
noise,  random  walk,  and  exponentially  correlated  noise.  The  system 
identification  formulas  were  coded  in  computer  software  to  run  a  Kalman 
filter  on  the  states  given  a  first  guess  to  the  parameters,  adjust  the  values  of 
the  parameters  using  the  partial  derivatives  of  the  log-likelihood  function 
which  are  generated  in  running  the  Kalman  filter,  and  repeat  the  process 
until  convergence  is  obtained  to  the  maximum  likelihood  estimates  of  the 
stochastic  and  dynamic  parameters.  A  Cramer-Rao  lower  bound  for  the 
uncertainty  of  the  parameter  estimates  is  provided  by  the  inverse  of  the 
Fisher  information  matrix  which  is  calculated  during  this  process. 

The  results  were  that  the  FIMLOF  method  provides  an  accurate 
estimate  of  Markov  noise  parameters  if  a  sufficient  number  of  measurements 
are  available.  The  accuracy  of  such  estimates  was  verified  by  comparing  the 
Cramer-Rao  lower  bound  to  the  estimation  error  in  the  case  of  computer 
generated  sample  paths  where  the  "true"  parameter  values  were  available. 
When  the  method  was  applied  to  experimental  data  displaying  a  -1  log-log 
PSD  slope,  the  frequency  response  characteristics  of  the  model  using  Markov 
noise  parameter  estimates  matched  those  of  the  real  system. 


1.3.2  Maximum  Likelihood  Estimation  of  Fractional  Brownian  and  Other 

Parameters 

In  Chapter  7,  the  likelihood  function  for  the  sum  of  fractional 
Brownian  and  other  Gaussian  noise  processes  is  computed  from  the 
autocorrelation  functions  of  the  processes,  for  both  the  increment  and  sum  of 
increment  formulations.  The  iterative  determination  of  trend,  fractional 
Brownian,  and  Markov  noise  parameters  was  coded  on  a  computer  using  this 
expression  for  the  likelihood  function  with  an  N  x  N  measurement 
covariance  matrix,  where  N  is  the  number  of  observations.  Fits  to  simulated 
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data  were  done  with  N  =  128  and  with  N  =  200.  There  are  obvious 
computational  difficulties  for  very  large  N.  The  non-Markov  nature  of 
fractional  Brownian  motion,  with  correlations  extending  over  all 
observations,  makes  this  batch  approach  rather  than  a  sequential  approach 
necessary. 

Using  this  approach,  it  was  possible  to  accurately  estimate  both 
stochastic  and  deterministic  parameters  in  cases  involving  combinations  of 
computer  generated  sample  paths  of  fractional  Brownian  motion,  trend,  and 
white  noise.  The  accuracies  of  these  estimates  were  verified  by  comparing  the 
Cramer-Rao  lower  bound  to  the  estimation  errors.  However,  when  fractional 
Brownian  motion  was  combined  with  exponentially  correlated  noise,  N  =  200 
measurements  were  insufficient  to  allow  the  algorithm  to  converge.  At  this 
point,  the  need  for  more  measurements  came  into  conflict  with  the 
computational  expense  of  inverting  the  N  x  N  measurement  covariance 
matrix  and  calculating  its  partial  derivatives. 

A  successful  estimation  of  fractional  Brownian  motion  parameters  was 
also  made  using  the  -1  log-log  PSD  slope  experimental  data  that  had  been 
used  in  FIMLOF  estimation. 
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Maximum  Likelihood  Estimation 


2.1  Parameter  Estimation 

The  goal  of  this  thesis  is  to  estimate  the  values  of  unknown  parameters 
in  the  models  of  fractional  Brownian  motion  and  Markov  noise  processes. 
The  information  that  is  available  for  use  in  this  estimate  is  a  set  of 
measurements  of  the  output  of  the  model.  There  are  several  methods  which 
are  available  for  use  in  problems  of  this  sort.  Maximum  likelihood 
estimation  is  chosen  because  of  its  simplicity  and  its  well  documented 
favorable  properties  which  will  be  presented  in  this  chapter. 


2.2  Likelihood  Function  and  Negative  Log-Likelihood  Function 

Let  z  =  [z\, ...,  zjyjF  be  observations  or  measurements  at  times  tj, ...,  tjsj  of 
some  system  involving  dynamic  and  noise  processes  dependent  on 
parameters  a  =  [aj, ... ,  aq]T.  The  likelihood  function  is  the  joint  probability 
density  of  the  measurements  as  a  function  of  the  measurement  and 
parameter  values: 

p(z;a)  =  p(zlr ... ,  zN ;  alr ...  ,aq)  (2.2-1) 

Since  p(z;a)  is  a  probability  density. 
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J  p(z;a)  dz  =  1 


(2.2-2) 


Taking  partial  derivatives  of  both  sides  of  the  above  equation  yields 


r  3p(z;&) 

J  aoq 
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^  daidoij 


dz 
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(2.2-3) 


The  negative  log-likelihood  is  the  negative  of  the  natural  logarithm  of 
the  likelihood  function: 

C(z ;&)  =  -ln[p(z;<2)]  (2.2-4) 

The  maximum  likelihood  estimates  for  the  parameters  &  are  the  values  that 
maximize  the  likelihood  function  or  equivalently  minimize  the  negative 
log-likelihood  function  given  a  set  of  measurements  z. 


2.3  Fisher  Information  Matrix 

The  Fisher  information  matrix  is  the  expected  value  of  the  Hessian  of 
second  partial  derivatives  of  the  negative  log-likelihood  function: 


I 


ij 
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(2.3-1) 


where  E{  }  denotes  expectation  (integration  over  the  probability  density  of  the 
random  variable).  As  will  be  explained  in  the  sequel,  this  matrix  provides  a 
measure  of  the  information  contained  in  a  parameter  estimate. 


It  will  be  useful  to  have  a  simplified  expression  for  the  information 
matrix  which  does  not  involve  second  partial  derivatives.  By 
Equations  (2.2-3)  and  (2.2-4) 
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d2p(z;g) 
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(2.3-2) 


2.4  Cramer-Rao  Lower  Bound 

An  estimate  6cj  of  a  parameter  aj  is  a  function  of  the  measurements: 

5j  =  fj(Z|, Zjsj)  (2.4-1) 

As  a  function  of  the  measurement  random  variables  it  is  also  a  random 
variable.  The  function  fj  cannot  be  arbitrarily  chosen  if  the  parameter 
estimate  is  to  have  good  properties.  One  desirable  property  of  an  estimator  is 
that  it  be  unbiased: 

E{5j}  =  true  value  of  Oj  (2.4-2) 

Another  desirable  property  is  that  its  variance  be  small,  although  there  is  a 
lower  bound  as  to  how  small  it  can  be. 

In  the  case  of  a  scalar  parameter,  if  a  is  an  estimate  of  the  parameter  a 
then  [7] 


J  (a  -  a)  p(z;a)  dz  =  b(a)  (2.4-3) 

where  b(a)  is  the  bias  associated  with  the  estimate.  Assuming  that  p(z;a)  has 
a  first  derivative  and  taking  the  partial  derivative  of  each  side  gives 
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f  5 

J  da  ‘  a)  d2  =  b'(a)  (2.4-4) 

where  b'(a)  is  the  derivative  of  the  bias  with  respect  to  a.  Taking  the 
derivative  of  the  product  gives 


f  r*\/'r7*nA  rl?  _L_  f 


J  (a  - a)  { ^  ln[p(z;a)] }  p(z;a)  dz  =  1  +  b'(a)  (2.4-6) 

At  this  point,  the  Schwarz  inequality  for  integration  with  respect  to  the 
measure  p(z;oc)  dz  may  be  applied  to  give 

J  (a  -  a)2  p(z;a)  dz  J  { ^  ln[p(z;a)]  }2  p(z;a)  dz  >  [1  +  b’(a)]2  (2.4-7) 


Finally,  using  the  definition  of  the  expected  value  operator  and  the  scalar 
form  of  Equation  (2.3-2)  leaves  a  lower  bound  for  the  variance  of  an  estimate: 


E{(&-cc)2}  > 


[1  +  b'(oc)]2 
I 


(2.4-8) 


This  lower  bound  is  known  as  the  Cramer-Rao  lower  bound.  Equation  (2.4-8) 
shows  that  for  an  unbiased  scalar  estimator,  the  Cramer-Rao  lower  bound  is 
equal  to  the  inverse  of  the  Fisher  information  matrix,  which  has  specialized 
to  a  scalar. 


In  the  case  of  an  unbiased  estimate  of  q  unknown  parameters,  this 
lower  bound  generalizes  to  [45] 

E{(a-  a)2}  -  I-l  >  0  (2.4-9) 

which  means  that  every  element  of  the  covariance  of  the  estimate  must  be 
greater  than  the  corresponding  element  of  the  inverse  of  the  Fisher 
information  matrix.  Thus  the  Cramer-Rao  lower  bound  for  the  variance  of 
the  ith  parameter  estimate  is  equal  to  [I_1lii-  In  this  manner,  the  Fisher 
information  matrix  provides  a  measure  of  the  amount  of  information  an 
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estimator  employs  by  providing  a  limit  to  the  accuracy  with  which  each 
parameter  may  be  estimated. 


2.5  Properties  of  Maximum  Likelihood  Estimates 

As  stated  in  Section  2.2,  maximum  likelihood  estimates 
a  =  [6q, ...,  6tq]T  of  the  parameters  are  those  for  which  the  likelihood  function 
p(z;a)  is  a  maximum,  or  equivalently  for  which  the  negative  log-likelihood 
£(z;&)  is  a  minimum.  In  other  words,  maximum  likelihood  estimated 
parameter  values  are  such  that  it  is  most  likely  that  the  observations  that  did 
occur  would  have  occurred. 

In  order  for  maximum  likelihood  estimation  to  be  carried  out,  it  is 
necessary  that  for  two  different  parameter  vectors  and  fit.",  the  joint 
probability  density  of  the  observables  that  occurred  should  not  be  the  same: 

p(z;fit')  *  p(z;&")  (2.5-1) 

As  stated  at  the  start  of  this  chapter,  one  reason  for  using  maximum 
likelihood  estimation  in  this  thesis  was  the  fact  that  it  has  favorable 
properties.  These  properties  are  that  as  the  number  of  measurements 
approaches  infinity,  the  maximum  likelihood  estimate  is  consistent,  efficient, 
and  sufficient.  These  properties  are  explained  in  the  following  paragraphs. 

If  an  estimate  &  converges  in  probability  to  the  true  value  fiL  of  a 
parameter  vector  fit  as  the  number  of  measurements  N— it  is  called  a 
consistent  estimate  of  fit  [45].  This  means  that  the  estimate  is  unbiased  and  the 
covariance  of  the  estimate  goes  to  zero  as  the  number  of  measurements 
approaches  infinity. 

If  an  estimate  &  is  an  unbiased  estimate  for  &  such  that  no  other 
unbiased  estimate  has  a  smaller  variance,  then  it  is  called  an  efficient  estimate 

of  a  [45].  An  asymptotically  efficient  estimate  is  efficient  as  the  number  of 
measurements  becomes  large. 
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If  £  is  an  unbiased  estimator  for  a  such  that  for  any  statistic  a  (function 
of  the  observables  z),  the  distribution  of  the  conditional  random  variable  a  IS 
does  not  depend  on  a,  then  &  is  a  sufficient  estimator  for  a[45]. 

If  z  =  [z\, ...,  zn]T  are  independent  measurement  random  variables 
with  the  same  probability  distributions  q(zj;a),  so  that  the  joint  density  is 
p(z;a)  =  q(z];a)—q(zjsj;a)/  then  the  maximum  likelihood  estimates  [64, ey 
are  asymptotically  consistent,  efficient,  sufficient,  normally  distributed  about 
the  true  value  a./  and  the  Cramer-Rao  lower  bound  becomes  tight  [45]. 

If  the  measurements  are  not  independent  identically  distributed 
random  variables,  it  can  be  shown  [28]  that  the  maximum  likelihood 
estimates  are  still  asymptotically  consistent,  unbiased,  efficient,  and  normally 

distributed  abouL  the  true  value  a?  However,  this  requires  that  the  first, 
second,  and  third  partial  derivatives  of  the  likelihood  function  exist  over  the 
admissible  range  of  parameters.  Thus,  maximum  likelihood  estimation  is 
still  attractive,  and  the  Cramer-Rao  lower  bound  remains  a  useful  measure  of 
the  accuracy  of  the  estimator  in  the  limit  of  a  large  number  of  observations. 


2.6  Iterative  Determination  of  Maximum  Likelihood  Estimates 


Let  a  =  [&i,  ...  ,  otq]T  be  the  maximum  likelihood  estimates  of  the 
parameters  a  For  these  values  the  likelihood  function  is  a  maximum,  or  the 
negative  log-likelihood  is  a  minimum: 

C(z;&)  =  minimum  (2.6-1) 

where  z  =  [zlr ...,  z^F  are  the  measurements.  This  means  that 


8£(z;a)  I 
dax  'a=a 


(2.6-2) 


Let  fio  =  [aj0, ...,  (Xq0]T  be  first  guesses  for  the  values  of  the  parameters, 

with 
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Aoj  =  Oj-ctp  ,  j  =  l,~,q  (2.6-3) 

Given  eto  and  the  measurements  z  =  [z-j,  ...,  zN]T,  the  negative  log-likelihood 
function  £(z;&o)  is  calculated  along  with  its  first  and  second  partial  derivatives 
with  respect  to  the  cq  in  order  to  determine  the  adjustments  AtXj  to  the  first 
guesses  Oj0. 

Approximating  Equation  (2.6-2)  with  the  first  two  terms  of  a  Taylor 
series  expansion  gives 

o  =  I 

3(Xj  *&=& 


acfeffl)  I  +  jr  a2C(z;ffi)  I  A 
'sl=Oo  jfi  doqdaj  ' 


(2.6-4) 


This  leads  to  the  following  set  of  linear  equations  to  solve  for  the  update  Aocj 
to  the  guesses  0CjO  to  approach  the  maximum  likelihood  estimates  6cj: 

2  Ajj  Aotj  =  Bj  ,  i  =  1, ...,  q  (2.6-5) 

where 


n  3C(z ;o) 

Bi  ="  3a, 

/  i  =  1,  — /  q 

(2.6-6) 

92£(z;&) 
**  3aj9(Xj 

,  i,j  =  1, ...,  q 

&=&o 

(2.6-7) 

The  expected  value  of  the  coefficient  matrix  Ajj  is  the  Fisher 
information  matrix.  As  an  approximation,  Ajj  can  be  replaced  by  its  expected 
value,  obtaining  by  Equations  (2.3-1)  and  (2.3-2) 


Aij  s  E  { 


3C(sa)  3^(sa)l  } 

0oci  9oij 


i/j  =  1/ 


(2.6-8) 


With  the  Fisher  information  approximation  to  the  Hessian  of  the 
negative  log-likelihood  function,  only  the  gradient  vector  of  first  partial 
derivatives  of  the  negative  log-likelihood  function  need  be  calculated.  In 
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some  cases  in  which  both  the  exact  and  approximate  Hessians  were  calculated 
in  this  thesis,  the  Fisher  information  approximation  gave  better  iterative 
convergence  to  the  maximum  likelihood  parameter  estimates  than  using  the 
exact  Hessian  (see  Section  6.6). 

Using  the  new  values  of  the  parameters  obtained  by  solving 
Equations  (2.6-5),  the  negative  log-likelihood  and  its  partial  derivatives  are  re¬ 
calculated  and  further  adjustments  Aotj  are  made  to  the  parameters.  The 
iteration  continues  until  convergence  is  obtained  to  the  maximum  likelihood 
estimates  of  the  parameters. 


2.7  Relation  to  Least  Squares  Estimates 

Suppose  the  measurements  zk  are  related  to  theoretical  model 
functions  fk  by 

zk  =  fkO*/fl)  +  £k  /  k=l,...,N  (2.7-1) 

where  ek  are  independently  normally  distributed  zero  mean  random  errors 
with  standard  deviations  Sk.  The  likelihood  function  is 

,  ,  1  ^-Z(zk-fk(tk;fi))2/(28k2)  (2.7-2) 

Pfe2)  "  (2It)N/26r.5N  6 

Maximum  likelihood  estimates  of  a  which  maximize  Equation  (2.7-2) 
or  minimize  the  negative  log-likelihood 

r  N  1  1  ^  (zk-fk(tk;a))2 

C(z;ffl)  =  [  ~2  ln(2n)  +  ln(61-**5N)  ]  +  - ^2  (2.7-3) 

k=l  °k 

are  the  same  as  weighted  least  squares  estimates  which  minimize  the  £  term 
in  Equation  (2.7-3),  presuming  that  the  constant  part  in  brackets  [  ]  does  not 
depend  on  oq, ...,  0Cq. 

The  condition  for  a  minimum  is 
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3£(z;g)  _  y  zk~fk(tk;&)  34(tk/'fiO 

a«i  1^1  8k2  da* 


0  when  &  =  a  (2.7-4) 


The  Fisher  information  matrix  Equation  (2.6-8)  is 

y  J_  9fk(z;fi0  5fk(z;a) 

lj  5k2  a«i  3cXj 


q 


(2.7-5) 


since  the  partial  derivatives  of  the  f^  are  non-random  functions  and  the 
Zj<  -  fj<  are  independently  normally  distributed  zero  mean  random  variables 
with  standard  deviations  5^. 


The  same  set  of  linear  equations  for  the  adjustments  to  first  guesses  for 
the  parameters  as  Equations  (2.6-5),  called  normal  equations  in  least  squares 
estimation,  can  be  obtained  from  Equation  (2.7-4)  with  a  first  order  Taylor 
expansion  for  fk  without  any  statistical  assumptions  or  second  partial 
derivatives.  Namely,  let  =  [oq0, ...,  amo]T  be  first  guesses  for  the  values  of 
the  parameters,  with  Equation  (2.6-3)  giving  the  adjustments  AtXj  towards  the 
least  squares  estimates  otj.  Assume  that 

N,  3fk(z;flo) 

fkfeffl)  =  fkfeSo)  +  Z  -J£~^-Aai  (2.7-6) 

H  3ai 

3fk(z;a)  afkfeiXo) 


Inserting  Equations  (2.7-6)  and  (2.7-7)  into  (2.7-4)  yields  Equations  (2.6-5),  with 
Ajjj  being  given  by  Equation  (2.7-5). 

Thus,  least  squares  estimation  involves  the  same  calculations  as 
maximum  likelihood  estimation,  but  if  the  statistics  of  the  measurement 
errors  are  known,  maximum  likelihood  theory  allows  the  Cramer-Rao 
lower  bound  to  be  applied,  with  a  lower  bound  for  the  covariance  of  the  least 
squares,  maximum  likelihood  estimates  being  the  inverse  of  the  coefficient 
matrix  of  the  normal  equations. 


35 


MAXIMUM  LIKELIHOOD  ESTIMATION  OF  FRACTIONAL  BROWNIAN  MOTION 


2.8  Relation  to  Other  Estimators 

The  maximum  likelihood  method  described  in  the  previous  sections  is 
known  as  classical  or  Fisher  maximum  likelihood  estimation.  The  state 
estimator  used  in  a  Kalman  filter  is  called  a  Bayes  estimator,  because  it 
involves  conditional  expectation.  Namely,  the  Kalman  filter  estimate  of  the 
state  x  at  a  given  time  tN  in  a  state  dynamic  system  is  the  expected  value  of 
the  state  given  the  observations  z\, ...,  zn  up  to  that  time  and  the  initial 
probability  density  of  the  states  at  time  to. 

If  a  state  dynamic  system  involves  unknown  parameters  fit  which  are 
to  be  estimated,  then  the  state  vector  x  could  be  augmented  by  states  a  with 
dynamic  equation  d&  =  0  +  noise.  However,  the  expanded  state  dynamic 
system  is  generally  nonlinear  (involving  products  of  x  and  a),  even  if  the 
original  system  equations  were  linear  in  x-  Nonlinear  systems  require  the 
implementation  of  an  extended  Kalman  filter.  Convergence  can  be  a  problem 
for  an  extended  Kalman  filter  estimate  of  x  and  fit  [23]. 

The  approach  taken  in  Chapter  6  to  estimate  the  parameters  a  in  a  state 
dynamic  system  is  to  apply  maximum  likelihood  estimation  to  the  probability 
density  p(z;&)  of  the  observables  z  as  a  function  of  the  parameters  &  generated 
by  running  the  Kalman  filter  to  estimate  the  states  assuming  values  for  &. 
This  probability  density  is  not  written  p(zl&),  because  no  conditional 
probabilities  are  involved,  as  there  is  no  prior  probability  information  about 
Sk¬ 
in  Chapter  7  maximum  likelihood  estimation  of  &  is  applied  to  a 
probability  density  p(z ;&)  arising  from  a  fractional  Brownian  noise  process, 
with  no  Kalman  filter  involved.  It  is  a  non-Markov  process  and  the  Kalman 
filter  derivation  depends  on  the  Markov  property  (see  Section  4.7). 

The  Bayesian  maximum  a  posteriori  probability  (MAP)  estimator  is 
slightly  different  from  the  Fisher  maximum  likelihood  estimator.  Instead  of 
maximizing  the  probability  of  the  measurements  as  a  function  of  the 
parameters,  this  method  consists  of  maximizing  the  probability  of  the 
parameters  as  a  function  of  the  measurements  [17].  This  probability  is  found 
using  Bayes'  rule 
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po'tf  ■  <*«> 

Note  that  the  negative  log-likelihood  function  of  Equation  (2.8-1)  is 

C(z;&)  =  -  ln[p(z  I  a)]  -  ln[p(fl)]  +  ln[p(z)]  (2.8-2) 

The  difference  between  this  equation  and  Equation  (2.2-4)  is  that 
Equation  (2.8-2)  requires  the  probability  densities  of  a  and  z.  If  there  is  no 
prior  knowledge  of  the  distribution  of  the  parameters,  this  distribution  must 
be  assumed  uniform  because  all  parameter  values  are  equally  likely.  In 
addition,  p(z)  must  also  be  assumed  uniform  because  this  distribution  is  a 
function  of  the  unknown  parameters  a.  In  this  case,  neither  p(a)  nor  p(z)  is  a 
function  of  the  parameters  themselves,  and  the  MAP  or  Bayesian  maximum 
likelihood  estimate  reduces  to  the  classical  Fisher  maximum  likelihood 
estimate. 
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3.1  Stochastic  Processes 

A  stochastic,  or  random,  process  consists  of  a  family  of  random 
variables  which  are  taken  from  a  probability  space  to  and  indexed  by  a 
parameter  t.  These  processes,  which  may  be  vector  valued,  are  commonly 
written  as  x(bto)  or  simply  x(t).  A  random  process  is  distinguished  from  a 
deterministic  process  by  the  fact  that  the  latter  is  known  exactly  over  the  time 
span  of  interest  while  the  former  involves  some  element  of  chance. 

A  stochastic  process  may  be  placed  into  one  of  four  categories  based 
upon  the  associated  probability  space  and  index  [18]  The  probability  space 
may  be  either  continuous  or  discrete,  and  the  index  may  also  be  either 
continuous  or  discrete.  Random  variables  drawn  from  a  discrete  probability 
space  may  only  take  discrete  values  though  there  may  be  an  infinite  number 
of  possible  values  to  choose  from.  The  set  of  integers  is  an  example  of  a 
discrete  probability  space.  A  continuous  probability  space  allows  random 
variables  to  take  any  value  within  a  specified  range.  The  set  of  real  numbers 
is  an  example  of  a  continuous  probability  space.  This  thesis  will  only  consider 
signals  defined  on  continuous  probability  spaces,  i.e.  systems  with  unknown 
parameters  that  may  take  any  value  within  a  given  range.  The  distinction 
between  continuous  and  discrete  indices  is  analogous  to  that  between 
continuous  and  discrete  time  systems.  In  fact,  the  index  will  be  considered  to 
be  tii  le  throughout  the  remainder  of  this  thesis. 
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A  sample  path  or  realization  of  a  stochastic  process  is  the  time  function 
x(t,co)  for  a  fixed  value  of  oo.  Different  sample  paths  of  the  same  stochastic 
process  will  not,  in  general,  be  identical.  The  set  of  all  possible  sample  paths 
of  the  process  is  called  its  ensemble. 


3.2  Autocorrelation  Function  of  a  Stochastic  Process 

One  way  to  characterize  stochastic  processes  is  to  use  correlation 
functions.  The  correlation  function  of  two  processes  x(t)  and  y(t)  is  defined  to 
be  the  expected  value  of  their  product  at  different  times  ti,  t2  [8]: 

4>xy(tl/t2 )  =  E{x(t])  y(t2)T}  (3.2-1) 

In  general,  (J)Xy  is  a  matrix  with  the  element  [<J)Xy]ij  given  by  the  scalar 
correlation  function 


foxyta'^iJ  =  E{xj(ti)yj(t2)}  (3.2-2) 

The  units  of  a  correlation  function  are  equal  to  the  product  of  the  units  of  the 
signals  of  interest. 

The  correlation  function  of  a  process  with  itself  is  called  its 
autocorrelation  function  and  is  given  by 

<}>xx(ti,t2)  =  E{x(ti)  x(t2)T}  (3.2-3) 

In  this  special  case,  the  diagonal  terms  are  actually  the  autocorrelations  of  the 
scalar  processes  which  make  up  the  vector  x(t)  and  the  off  diagonal  terms  are 
known  as  the  cross  correlations  between  the  individual  scalar  processes.  The 
autocorrelation  function  of  a  process  is  essentially  a  measure  of  the 
dependence  of  the  value  of  the  process  at  one  time  with  its  value  at  other 
times. 


The  covariance  function  of  a  random  process  is  found  by  subtracting 
the  mean  from  the  process  and  calculating  the  autocorrelation  of  the 
resulting  signal: 

cov[x(t!,t2)]  =  E{  [x(ti)  -  u(t1)][[x(t2)  -  U(t2)]j )  (3.2-4) 


40 


Chapter  3: _ Power  Spectral  Density  Analysis 


where 


ji(t)  =  Ek(t)}  (3.2-5) 

The  fact  that  the  autocorrelation  function  of  a  zero  mean  stochastic  process  is 
equal  to  its  covariance  will  be  of  use  in  Chapter  7. 

A  stochastic  process  is  called  stationary  if  its  probability  density  is  not  a 
function  of  time  [8].  This  means  that  none  of  the  statistics  of  the  process  will 
be  functions  of  absolute  time.  A  stationary  random  process  is  analogous  to  a 
time  invariant  deterministic  system.  The  autocorrelation  function  of  such  a 
process  will  depend  only  upon  the  difference  between  the  time  indices 
t  =  ti  -  t2-  The  autocorrelation  function  of  a  stationary  process  is  even,  i.e. 
<])xx(t)  =  <t)xx(-'t).  This  may  be  illustrated  using  the  change  of  variable  s  =  t  +  x : 

<t>xx(x)  =  E{x(t+x)x(t)}  =  E{x(s)x(s-t)}  =  (j)xx(-'t)  (3.2-6) 

Note  also  for  a  stationary  process  with  s  =  t  +  ^ 

<Mt)  =  E(x(t+x)x(t)}  =  E{x(s+f)x(s-^)}  (3.2-7) 

An  ergodic  process  is  a  stationary  process  that  has  the  additional 
property  that  time  averaging  over  a  particular  realization  is  equivalent  to 
ensemble  averaging,  so  that  in  particular 

T 

<t>xxCc)  =  lim^;  J  x(t+x)  x(t)  dt  (3.2-8) 

T-»°°  .j, 

for  any  realization  x(t)  [6].  This  is  a  useful  property  to  have  when  dealing 
with  experimental  data  because  it  is  not  common  to  have  a  large  set  of  sample 
paths  available  for  use  in  ensemble  averaging. 


3.3  Estimation  of  the  Autocorrelation  Function 

For  a  scalar,  ergodic,  stochastic  process,  given  a  span  of  data  x(t)  where 
-T  <  t  <  T,  a  change  of  variables  in  Equation  (3.2-8)  shows  that  an  estimator  for 
the  autocorrelation  function  is  given  by  [34] 
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(j)xx(x)  = 


2T 


x(t+|) x(t-f) dt  ,  Ixl  <2T 


(3.3-1) 


It  will  be  useful  in  Section  3.5  to  note  that  the  autocorrelation  given  in 
Equation  (3.3-1)  is  similar  to  a  convolution  of  x(t).  Appendix  A  reviews  the 
definitions  of  correlation  and  convolution  and  provides  some  useful 
relations. 


If  p(co)  is  the  probability  density  of  the  stationary,  ergodic  process  x(t) 
then  the  expected  value  of  the  estimate  (3.3-1)  is  given  by  [34] 

Ixl 


E{<MT)}  —  J 


T- 


-T+- 


J  x(t+|)  x(t-|)  dt 


p(co)  do) 


T- 


ixl 


-s  J 


-T+- 


I  x  I 


J  x(t+|)  x(t-|)  p(0))  doo 


dt 


T- 


2T 


J  <t>xx("t)  dt  by  Equation  (3.2-7) 


-T+- 


,  Ixl 

-  (1  "  tyxxW 


(3.3-2) 


Equation  (3.3-2)  shows  that  the  estimate  of  the  autocorrelation  function 
which  is  given  by  Equation  (3.3-1)  is  unbiased  in  the  limit  as  T  becomes  very 
large  with  respect  to  x  [34].  Multiplying  ^xx('t)  by  (1  -  Ixl  /2T)'1  will  also  make 
the  estimate  unbiased. 
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3.4  Power  Spectral  Density  of  a  Stationary  Stochastic  Process 

The  power  spectral  density  (PSD)  of  a  stationary  random  process  is 
defined  to  be  the  Fourier  transform  of  its  autocorrelation  function  [8] 

oo 

r  -2jtifi 

<hxx(f)  =  J  <j)xx(l)  6  dt  ,  -oo  <  f  <  oo  (3.4-1) 

-o© 

where  f  is  frequency  in  Hz  and  i  =  V-T.  The  PSD  has  units  equal  to  (units2) /Hz 
if  the  process  x(t)  is  measured  in  units. 

Because  the  autocorrelation  function  of  a  stationary  stochastic  process 
is  even,  the  PSD  is  also  even  so  that  the  one-sided  PSD  is  equal  to  [5] 

1  r  -2jrifr 

<hxx(f)  =  2  J  4»xx(x)  e  dt  ,  f  >  0  (3.4-2) 

-oo 

*L(f)  =  *XX(f)  +  ***W>  ,  f^O  (3.4-3) 

The  one-sided  PSD  is  often  used  in  analyzing  data  where  it  is  convenient  to 
regard  frequency  f  as  being  positive,  whereas  the  two-sided  PSD  is  more 
convenient  for  mathematical  proofs. 

As  will  be  shown  in  Chapters  4  and  5,  different  types  of  stochastic 
processes  have  their  own  characteristic  PSD  shapes.  This  makes  the  PSD 
useful  for  determining  the  frequency  range  where  a  given  process  is  more 
prominent  in  a  given  signal. 


3.5  Estimation  of  the  Power  Spectral  Density 
3.5,1  Continuous  Data 

It  is  often  desirable  to  determine  the  PSD  of  a  stationary,  ergodic,  scalar 
stochastic  process  from  a  finite  sample  path  of  that  process.  One  way  to  do 
this  is  to  take  the  Fourier  transform  of  the  estimate  of  the  autocorrelation 
function  defined  by  Equation  (3.3-1)  [34]: 
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Oxx(f) 


2T 


r  ~  ,  v  -2nifr 

J  0xx(x)e  dx 


-2T 


(3.5.1-1) 


or 


2T 

^xx(f)  =  J 

-2T 


2T 


x(t+|)  x(t-|)  dt 


-27tift 

e 


dx 


(3.5.1-2) 


Note  that  this  integral  is  only  evaluated  between  -2T  and  2T  because  the 
estimate  of  the  autocorrelation  function  is  only  defined  on  this  region  and  is 
assumed  to  be  zero  outside  of  it. 


Taking  advantage  of  the  fact  that  <j)  xx  is  an  autocorrelation  of  x(t), 
assumed  zero  for  1 1 1  >  T,  Equation  (A-6)  in  Appendix  A  shows  that 
Equation  (3.5.1-2)  may  be  expressed  as 

Oxx(f)  =  X(f)  x*(f) 

=  “  I  X(f)  1 2  (3.5. 1-3) 

where  X(f)  indicates  the  finite  Fourier  transform  of  x(t),  and  X*(f)  indicates  its 
complex  conjugate. 

The  finite  Fourier  transform  of  x(t)  is  given  by 

T 

X(f)  =  J  x(x)  e"27tlfT  dx  (3.5.1-4) 

-T 

when  x(t)  is  defined  over  the  interval  -T  <  t  <  T.  This  is  equivalent  to  taking 
the  integral  from  negative  to  positive  infinity  and  setting  x(t)  equal  to  zero 
outside  of  the  range  defined  by  T. 

The  one  sided  PSD  estimate  is  given  by 

&^(f)  =  4= '  X(f)  1 2  ,  f  ^  0  (3.5.1-5) 
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The  expected  value  of  the  estimate  of  the  PSD  is  given  by 


E{$xx(f)}  = 


1 


r  2t  „ 
J  ^xxCx)  6 

— 2T 


p(co)  dto 


2T  'fr 

=  J  E{$:xx(t)}  e  dx 

-2T 

2T  ,  , 

<■  ,  III  „  -2mfT 

=  J  a  -  — )  (j)xx(x)  e  dx  (3.5.1-6) 

-2T  Z1 

Equation  (3.5.1-6)  shows  that  the  estimate  of  the  PSD  will  be  unbiased 
as  T  becomes  very  large,  i.e.,  as  the  span  of  available  data  becomes  very  large: 


lim  E{4>xx(f)}  = 

T— >oo 


OO 

t  ,  v  -2jtift 

J  <t>xx(x)e  dx 

-oo 


—  <M0 


(3.5.1-7) 


3.5.2  Interpretation  of  the  PSD  as  a  Power  Spectrum 
The  power  in  a  time  function  x(t)  is 

T 

=  lim  f  I  x(t)  1 2  dt 
t-*~  j. 

T 

=  lim  ^  f  I  x(t)  1 2  dt 

T-^°  _T 

Let  X(f)  be  the  Fourier  transform  of  x(t): 

OO 

X(f)  =  J  x(t)  6  2nift  dt 

-OO 


Total  Power 


Average  Power 


(3.5.2-1) 


(3.5.2-2) 


(3.5.2-3) 
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The  inverse  Fourier  transform  is  [34] 

©o 

x(t)  =  ^  J  X(f)  e  2raft  df  (3.5.2-4) 

-OO 


Plancherel’s  formula  is  [40] 

OO  OO 

2ic  J  I  x(t)  1 2  dt  =  J  I  X(f)  1 2  df 

-OO  -OO 


(3.52-5) 


Thus  the  total  power  in  the  time  signal  x(t)  multiplied  by  2jc  is  the  same  as  the 
total  power  in  the  Fourier  transform  X(f),  so  I  X(f)  1 2  may  be  interpreted  as 
giving  the  power  split  up  into  frequency  components,  or  as  the  power  density 
as  a  function  of  frequency. 

If  x(t)  is  real,  then  X(f)  =  X(-f).  In  this  case,  the  power  over  all 
frequencies  is  equal  to  twice  the  power  in  the  positive  frequencies: 

OO  OO 

I  I  X(f)  1 2  df  =  2  J  I  X(f)  1 2  df  (3.5.2-6J 

— OO  0 

This  fact  confirms  the  validity  of  the  one-sided  PSD. 

The  PSD  as  the  Fourier  transform  of  the  expected  value  of  the 
autocorrelation  function  is  estimated  by  Equation  (3.5.1-3),  so  that 

<Dxx(f)  =  Oxx(f) 

=  ~  I  X(f)  1 2  (3.52-7) 

=  average  power  in  x(t)  broken 
into  frequency  components 
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3.5.3  Discrete  Data 


It  is  most  common  for  data  to  be  taken  at  discrete  times  using  a  digital 
computer.  Assuming  that  the  samples  of  the  continuous  process  x(t)  are 
evenly  spaced  in  time  from  -T  to  T,  Equation  (3.5.1-4)  is  approximated  by 

X(f)  =  X  xKk+^At-T]  e  At  (3.5.3-1) 

k  =  0 

where 

At  =  sec  (3.5.3-2) 

Define  the  discrete  function 

xk  =  x[(k+|)At-T]  ,  k  =  0, 1, ...,  N-l  (3.53-3) 


and  the  discrete  Fourier  transform 


N-l 

X  xke 


-2nijk/N 


k  =  0 


j  =  0, 1, ...,  N-l 


(3.53-4) 


Equation  (3.53-4)  has  a  rigorous  discrete  inverse  Fourier  transform  [6] 

xk  =  jjf  X  Xje2mjk/N  ,  k  =  0,1,...,  N-l  (3 .5.3-5) 
j  =  0 

For  j  =  0,  Equation  (3.53-4)  shows  that  is  the  average  or  mean  of  the  data. 
For  numerical  reasons,  the  mean  is  often  removed  from  the  data  without 
affecting  the  remaining  Fourier  transform  coefficients,  in  which  case  the  DC 
term  becomes  X0  =  0. 

The  discrete  equivalent  to  Plancheral's  theorem  is  Parseval’s  theorem 
[40] 

N-l  N-l 

X  IXjl2  =  N  X  lxk'2  (3.53-6) 

j  =  0  k  =  0 

This  theorem  shows  that  the  power  density  interpretation  of  the  continuous 
PSD  holds  for  the  discrete  PSD. 
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Substituting  Equation  (3.5.3-3)  into  Equation  (3.5.3-1)  yields 


X(f)  =  e 


-27iif(At/2  -  T) 


N-l 


X  xke 

k  =  0 


-2nifkAt 


At 


(3.5.3-7) 


so  that  except  for  the  phase  change  from  the  complex  exponential  in  front  of 
the  summation  sign  due  to  the  shift  in  the  time  origin  (which  does  not  affect 
the  PSD),  the  discrete  Fourier  transform  of  Equation  (3.5.3-4)  approximates  the 
continuous  Fourier  transform  at  discrete  frequencies  fj  by 

X(fj)  =  Xj  At  (3.5.3-8) 

with 

fj  =  NM  =  2T  Hz  (3.53-9) 


Using  sampled  data  over  a  time  span  2T  with  time  spacing  At  and 
2T  =  NAt,  the  discrete  Fourier  transform  is  thus  computed  at  a  frequency 
spacing  of 

Af  =  Jp  Hz  (3.5.3-10) 


If  x(t)  for  -«  <  t  <  °°  is  a  band-limited  signal  such  that  X(f)  =  0  for 
I  f  I  >  F,  and  Xj  is  the  discrete  Fourier  transform  of  samples  of  x(t)  at  a 
sampling  interval  At  with 


NAf 


N_  _  _N _ 

21  ~  NAt  “  At  “ 


2F 


then  x(t)  can  be  exactly  reconstructed  from  Xj  [11].  Thus  the  finite  Fourier 
transform  given  by  Xj  will  contain  all  of  the  magnitude  information  about  the 
time  series  x(t)  only  if  the  sampling  frequency  is  greater  than  2F.  This  is  the 
Nyquist  criterion.  If  the  time  series  x(t)  is  not  band-limited,  then  the  lower 
frequency  discrete  Fourier  transform  values  will  reflect  energy  from  the 
higher  frequency  components  of  the  signal.  This  phenomenon  is  known  as 
aliasing  [6]. 

If  the  data  x^  are  real.  Equation  (3.5.3-4)  shows  that 

XN.j  =  X  ,  j  =  1,2 . [y]  (3.5.3-11) 
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where  ^  y  J  is  the  largest  integer  less  than  or  equal  to  y  .  This  dependence  is 

due  to  the  Nyquist  criterion.  Information  is  only  available  about  the  power 
content  of  the  signal  at  frequencies  up  to  • 


The  fact  that  the  discrete  Fourier  transform  has  been  defined  for 

positive  frequencies  and  the  Nyquist  criterion  combine  to  limit  the  range  of 

1 

available  frequency  information  to  0  <  f  <  yy  .  Thus  by  Equations  (3.5.1 -5)  and 

(3.5.3-8),  the  estimate  from  a  finite  span  of  sampled  data  of  the  one-sided  PSD 
at  frequency  fj  is 


«>j)  = 


aivffi 


xx^[N/2] 


IXj|2  ,  j  =  o,i,...,[y]-i 

4^-  I^N/zil 2  if  N  is  odd 
>  =  2 

|^CN/2]| 2  if  N  is  even 
Z  1 


(3.5.3-12) 


(3.5.3-13) 


Equation  (3.5.3-13)  guarantees  that  the  one-sided  PSD  maintains  the 
symmetry  required  by  the  Nyquist  criterion. 


The  discrete  Fourier  transform  can  be  evaluated  using  the  Cooley- 
Tukey  Fast  Fourier  Transform  (FFT)  algorithm  if  N  is  a  power  of  2  [6].  This 
well  known  algorithm  greatly  reduces  the  computational  burden  of 
calculating  the  discrete  Fourier  transform,  with  the  number  of  computations 
being  proportional  to  N  log2  N  rather  than  the  N2  required  for  the  brute  force 
evaluation  of  Equation  (3.5.3-4)  [6]. 


3.6  Frequency  Averaging 

In  Section  3.5.1,  it  was  shown  that  the  estimate  of  the  power  spectral 
density  is  unbiased.  However,  it  can  also  be  shown  that  for  most  processes 
that  are  of  interest,  the  standard  deviation  of  this  estimate  at  a  given 
frequency  is  equal  to  the  actual  value  of  the  PSD  at  that  frequency  [6],  [17],  [33]. 
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This  makes  Equation  (3.5. 1-3)  a  poor  estimate  of  Oxx(f).  The  standard 
deviation  of  the  error  is  not  a  function  of  the  time  interval  of  available  data 
[6],  [17],  [33].  This  means  that  no  matter  how  large  T  becomes,  the  standard 
deviation  will  not  improve.  The  variance  of  the  estimate  may  be  decreased 
by  either  averaging  several  estimates  or  averaging  adjacent  frequencies  of  a 
given  estimate.  The  former  method  is  known  as  ensemble  averaging,  while 
the  latter  is  called  frequency  averaging. 

Frequency  averaging  will  be  employed  in  this  thesis  because  a 
collection  of  sample  paths  is  not  always  available  for  processing.  This  method 
is  valid  because  the  errors  in  the  estimates  of  adjacent  points  on  the  PSD  are 
nearly  independent  [17].  Frequency  averaging  has  the  disadvantage  of 
decreasing  the  resolution  of  the  PSD  because  several  points  on  the  graph  are 
replaced  by  one.  However,  because  a  PSD  is  commonly  plotted  using  a  log-log 
scale,  this  loss  of  detail  may  be  utilized  to  eliminate  the  "bunching  up"  of 
points  at  the  high  frequency  end  of  the  graph  where  the  human  eye  typically 
cannot  resolve  individual  frequencies.  This  is  done  by  averaging  fewer 
points  at  the  lower  frequencies  and  more  at  the  higher  end.  This  process 
results  in  a  PSD  that  is  more  accurate  at  the  higher  frequencies  because  more 
points  are  involved  in  averaging,  but  which  still  retains  good  resolution  at 
lower  frequencies. 

The  averaging  system  used  in  this  thesis  is  described  by  the  Table  3.4-1 
[22].  The  first  32  PSD  data  points  are  not  averaged.  Starting  with  the  33rd 
point,  the  PSD  data  is  divided  into  sets  of  points  with  indices  that  range  from 
2k_1  +  1  to  2k.  The  points  within  each  range  are  then  averaged  to  leave  only 
16  points.  This  system  reduces  the  number  of  points  in  the  PSD  to 
approximately  50  points  per  decade. 
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Table  3.4-1  Logarithmic  Frequency  Averaging  of  32,768  Points 


Data  Interval 

Points  in  Interval 

Number  of  Points 
per  Averaging 

Number  of 
Points  to  Plot 

1-32 

32 

1 

32 

31-64 

32 

2 

16 

65  - 128 

64 

4 

16 

129  -  256 

128 

8 

16 

257-512 

256 

16 

16 

513  - 1024 

512 

32 

16 

1025  -  2048 

1024 

64 

16 

2049  -  4096 

2048 

128 

16 

4097  -  8192 

4096 

256 

16 

8193  - 16384 

8192 

512 

16 

16385  -  32768 

16384 

1024 

16 
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Markov  Stochastic  Processes 


4.1  Martingale  and  Markov  Processes 

A  martingale  is  a  stochastic  process  x(t,w)  for  which  [13] 

E{x(t2)  I  x(ti)}  =  x(t-[)  fort2>ti  (4.1-1) 

where  E[*i«]  denotes  conditional  expectation.  In  other  words,  for  a 
martingale  the  expected  value  of  the  future  given  the  present  is  equal  to  the 
present.  The  name  martingale  is  the  French  term  for  the  gambling  strategy  of 
redoubling  a  bet  until  a  win  is  obtained.  This  eventually  has  to  occur  in  a  fair 
game,  except  that  the  house  always  has  the  odds  slightly  in  its  favor,  and 
except  for  the  theorem  of  gamblers  ruin.  This  states  that  in  a  fair  game  a 
gambler  with  a  finite  stake  betting  against  a  house  with  an  infinite  stake  will 
eventually  lose  his  fortune  with  probability  1  [12]. 

A  Markov  stochastic  process  x(t,o)),  as  defined  by  Markov  around  1906 
[2],  is  one  for  which  [18] 

E[x(tn+i)  lx(tn),  x(tn_i),  ...,x(ti)}  =  E{x(tn+i)  lx(tn)}  (4.1-2) 

In  other  words,  for  a  Markov  process  the  future  depends  on  the  present  and 
not  on  the  complete  past  history  of  the  process.  This  property  is  similar  to  the 
principle  of  causality  in  physics,  where  differential  equations  predicting  the 
future  behavior  of  a  system  depend  only  on  the  initial  conditions  of  the 
system  independently  of  how  the  system  reached  those  initial  conditions. 
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Presuming  the  validity  of  the  principle  of  causality,  one  would  expect 
to  be  able  to  model  the  noise  processes  in  physical  systems  as  Markov 
processes.  However,  if  a  system  is  modeled  with  an  incomplete  set  of  states, 
then  the  initial  conditions  of  the  reduced  set  of  states  do  not  completely 
specify  the  future.  Hence  an  incomplete  set  of  noise  states  could  lead  to  what 
seems  to  be  a  non-Markov  process. 

Chapter  5  defines  fractional  Brownian  motion  as  an  example  of  a  non- 
Markov  process.  In-so-far  as  the  log-log  PSD  slopes  in  fractional  Brownian 
motion  are  seen  in  experimental  data,  one  could  suppose  that  the  true 
underlying  physical  process  could  be  modeled  with  Markov  noise  states,  if 
enough  of  them  are  included.  However,  it  might  be  inconvenient  to  include 
a  sufficient  number  of  states  to  model  a  distributed  parameter  system. 

The  remainder  of  this  chapter  discusses  Markov  processes,  in  particular 
the  Brownian  motion  process  and  the  processes  derived  from  it. 


4.2  Weiner  Brownian  Motion  Process 

A  Weiner  process  or  Brownian  motion  process  or  random  walk 
process  p(t,co)  is  a  stochastic  process  such  that  [18] 

(1)  P(t)  has  stationary  independent  increments,  that  is  if 
tj  <  t2  <  t3  <  t4,  then  P(t2)  -  P(tj)  is  independent  of  P(t4)  -  p(t3), 
and  the  probability  distribution  of  P(t2)  -  P(tj)  only  depends 
on  t2  - 1|  and  not  on  t^  and  t2  individually. 

(2)  For  every  t,  P(t)  is  normally  distributed  with  zero  mean. 
Hence  the  increment  p(t2)  -  p(ti )  is  normally  distributed  with 
zero  mean,  and  by  (1)  the  variance  of  the  increment 
P(t2)  -  P(t-i)  depends  only  on  t2  -  tj. 

(3)  One  can  specify  that  Pr[p(0)=0]  =  1,  and  only  consider  P(t)  for 
t  >  0.  However,  one  can  also  consider  p(t)  for  -oo  <  t  < 
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A  stochastic  process  with  these  characteristic  properties  can  be 
mathematically  constructed  [20],  so  the  Weiner  process  exists.  It  is  a 
martingale,  because  for  t2  >  t| 

E{p(t2)lp(t1)}  =  E{  [p(t2)-p(tT)]  +  [p(t!)  -  (3(0)]  I  [pft,)-p(0)]} 

=  E{  CP(t-|)  -  p(0)]  I  CP(t-|)  -  (3(0)] } 

=  P(t!)  (4.2-1) 

by  the  stationary  independent  increment  property  and  assuming  (3(0)  =  0. 

Among  the  properties  of  a  Weiner  process  that  can  be  derived  from  the 
defining  properties  are  that  any  given  sample  path  is  almost  surely  (with 
probability  1)  continuous  and  almost  surely  non-differentiable  [18]. 

The  Weiner  process  models  the  behavior  of  the  Brownian  motion  of  a 
small  particle  suspended  in  a  fluid  and  continually  buffeted  by  collisions  with 
fluid  molecules,  as  first  observed  for  pollen  particles  in  water  by  the  biologist 
Brown  in  1828.  Einstein  worked  out  some  of  the  statistical  mechanical 
characteristics  of  Brownian  motion  in  1905,  Weiner  put  it  on  a  firm 
mathematical  footing  in  the  1920s,  and  Levy  further  investigated  its 
properties  in  the  1930s  and  1940s.  [19] 

There  are  trillions  of  molecular  collisions  per  second  with  a  particle 
undergoing  Brownian  motion.  In  between  collisions,  the  particle  motion  is 
differentiable  in  classical  physics.  Because  the  collision  of  a  molecule  with  a 
particle  actually  only  involves  the  interaction  of  force  fields,  the  motion 
through  a  collision  can  also  be  regarded  as  being  differentiable.  Therefore,  a 
physical  Brownian  motion  sample  path  is  everywhere  differentiable,  whereas 
the  stochastic  process  which  models  it  very  well  and  has  useful  properties, 
e.g.,  for  defining  stochastic  integrals,  is  nowhere  differentiable. 

Let  v(t2-tj)  be  the  variance  of  (3(t2)  -  (3(ti ).  By  the  stationarity  and 
independence  of  the  increments, 

v(2t)  =  E{  [P(t+2x)  -  P(t)]2  } 

=  E{  [(P(t+2x)  -  P(t+x))  +  (P(t+x)  -  P(t))]2  } 
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=  E{  [p(t+2x)  -  P(t+t)]2  }  +  E{  [P(t+T)  -  P(t)]2  } 

=  2  v(x)  (4.2-2) 

Similarly  v(nx)  =  nv(x)  and  v(x/n)  =  v(i)/n  for  any  integer  n,  so  that 
v(ax)  =  av(x)  for  any  rational  number  a,  or  any  real  number  a  if  v(x)  is 
assumed  continuous.  Thus  v(x)  =  v(l*x)  =  v(l)x,  so  that  if  v(x)  is  continuous 

v(x)  =  E{  [P(t2)  -  P(t1)]2 }  =  a2  1 12  -  ta  I  (4.2-3) 

where  o  is  called  the  standard  deviation  and  the  variance  of  the  Weiner 
process  [9],  [35]. 

The  autocorrelation  function  of  a  Weiner  process  for  tj  <  t2  assuming 
P(0)  =  0  is 

4>pp(ti/t2)  =  E{p(t1)p(t2)} 

=  E{  P(t!)[p(t,)  +  (p(t2)  -  p(t!))] } 

=  E{  [P(ta)  -  P(0)][(p(t!)  -  P(0))  +  (p(t2)  -  P(t!)] } 

=  o2  t!  (4.2-4) 

so  that 

<t>pp(tirt2)  =  a2min(t1,t2)  (4.2-5) 


4.3  White  Noise  PSD  Slope 

White  noise  is  a  zero  mean,  Gaussian,  stationary,  stochastic  process 
x(t,to)  which  has  a  delta  function  for  its  autocorrelation  function.  The  Fourier 
transform  of  a  unit  delta  function  is  1  for  all  frequencies.  Thus  the  two-sided 
PSD  of  white  noise  has  a  constant  level  of  a2 /Hz  for  all  frequencies.  This 
stochastic  process  is  called  white  noise  in  analogy  with  white  light,  which  has 
more  or  less  equal  energy  at  all  visible  frequencies.  The  log-log  PSD  slope  of 
white  noise  is  0. 
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Having  a  constant  PSD  level  for  all  frequencies  is  a  physical 
impossibility,  because  it  implies  infinite  power.  However,  the  concept  of 
white  noise  is  mathematically  useful  for  modeling  physical  systems.  This  is 
done  in  two  ways. 

First,  for  sampled  observations,  measurement  noise  is  often  assumed 
to  be  independent  from  one  discrete  sample  time  to  the  next.  This  whiteness 
assumption  causes  no  physical  difficulty  because  of  the  discrete  sample  times. 

Second,  white  noise  is  introduced  as  a  driving  force  in  state  differential 
equations  through  the  perfectly  rigorous  definition  of  the  stochastic  integral 
with  respect  to  the  differential  (rather  than  derivative)  of  a  Weiner  process. 
The  state  dynamics  then  shape  the  white  driving  noise  to  give  a  colored  noise 
response.  Here,  the  term  colored  noise  refers  to  a  stochastic  process  which  has 
different  energy  content  (or  PSD  level)  at  different  frequencies,  in  analogy 
with  colored  light. 


4.4  PSD  Slope  of  Random  Walk,  Trend,  and  Quantization  Noise 

A  Weiner  or  random  walk  process  has  stationary  independent 
increments,  but  itself  is  not  stationary.  Hence  its  PSD  is  not  defined  in  the 
sense  of  the  Fourier  transform  of  a  stationary  autocorrelation  function. 
However,  a  PSD  can  be  computed  from  the  magnitude  squared  of  the  Fourier 
transform  of  a  finite  segment  of  a  random  walk  sample  path. 

Either  by  considering  approximations  to  a  Weiner  process  or  by  doing 
formal  manipulations  with  its  correlation  function,  it  can  be  shown  that  both 
the  approximate  and  formal  derivative  of  a  Weiner  process  is  white  noise 
[18].  Thus,  the  PSD  of  the  Weiner  process  has  a  -2  log-log  slope.  This  can  be 
seen  by  using  integration  by  parts  in  the  Fourier  transform  of  the  derivative 
of  the  Weiner  process  g(t)  with  g(-«>)  =  g(«0  =  0, 

oo  OO 

J ^dt^  e'27rift  dt  =  2n:if  J  g(t)  e'2nift  dt  (4.4-1) 

-oo  -oo 
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Thus  the  Fourier  transform  of  random  walk  is  equal  to  times  that  of 
white  noise.  Using  Equation  (3.5.1-3)  to  arrive  at  the  PSD  shows  that  the  PSD 
of  random  walk  is  equal  to  — ^  multiplied  by  the  PSD  of  white  noise.  This 
has  the  effect  of  decreasing  the  log-log  slope  of  the  PSD  by  two.  Because 
Equation  (4.4-1)  holds  for  any  function  g(t),  integrating  any  function  will 
decrease  its  log-log  PSD  slope  by  two,  and  equivalently  differentiating  a 
function  will  increase  its  PSD  slope  by  the  same  amount. 

Now  consider  the  Fourier  transform  of  any  finite  segment  of  a  trend 
at,  -T  <  t  <  T: 

T 

X(f)  =  J  at  e'2nift  dt  (4.4-2) 

-T 

Equations  (3.5.3-2)  and  (3.5.3-9)  motivate  the  change  of  variables 

f  =  /  t  =  nAt  =  n—  (4.4-3) 


where,  for  the  moment,  j  and  n  are  allowed  to  vary  continuously.  This  leads 
to 


X(j) 


N/2 

J  anAte 

-N/2 


-2nijn/N 


Atdn 


(4.4-4) 


Using  integration  by  parts  Equation  (4.4-4)  becomes 

x®  =  +  e"1)  +  -  e  ”•)]  (4.4-5) 


If,  at  this  point,  j  is  restricted  to  take  on  only  integer  values  as  it  would 
in  the  discrete  Fourier  transform,  this  equation  reduces  to 


X(j) 


a  At2  N2 
2nj 


(-1)1 


i 


2aT2 

"j 


(-l)j  i 


(4.4-6) 


Therefore,  by  Equation  (3.5. 1-3)  the  value  of  the  PSD  at  frequencies  spaced  ^ 
Hz  apart  is  given  by 
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*xx(f)  =  ^-T  (4.4-7) 

which  will  have  a  -2  log-log  slope. 

Because  both  random  walk  and  trend  have  a  -2  log-log  PSD  slope,  they 
cannot  be  distinguished  using  frequency  domain  analysis.  This  is  one 
motivation  for  developing  the  maximum  likelihood  estimator  of  Chapter  6 
which  is  capable  of  separating  the  trend  and  random  walk. 


Quantization  noise  results  from  discretization  of  continuous  signals. 
The  PSD  of  quantization  noise  is  derived  in  this  section  because  it  is  present 
in  the  experimental  data  presented  in  Section  6.8.1. 


If  a  continuous  signal  is  being  measured  by  a  digital  device,  the  true 
value  of  the  signal  will  be  truncated  to  the  number  of  decimal  places  stored  by 
the  instrument.  Under  the  assumption  that  the  value  of  the  signal  is 
changing  rapidly  with  respect  to  the  level  of  quantization,  this  is  equivalent 
to  subtracting  a  uniformly  distributed  random  variable  n(t)  from  the  true 
value  of  the  signal.  The  probability  distribution  of  n  is  given  by 


Cl 


p[n(t)]  =  \ 


q 


10 


if  0  <  n  <  q 
else 


(4.4-8) 


where  q  is  the  resolution  of  the  measurement  device.  Quantization  noise  is 
assumed  to  be  uncorrelated  in  time  so  that 

(JJnnCc)  =  E{n(t+x)n(t)} 

=  E{n(t)2}  8(t) 

=  f  5(x)  (4.4-9) 


where  5(x)  is  the  delta  function. 

The  PSD  of  quantization  noise  is 
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oo 

<I>nn(0  =  J  4>nn(T)  e'2”ifr  dx 

— OO 


(4.4-10) 


As  expected,  this  PSD  is  a  constant  because  the  quantization  noise  is  assumed 
to  be  uncorrelated  in  time. 

If  a  measurement  y(t)  is  made  up  of  quantization  noise  subtracted  from 
a  stationary  signal  x(t),  then  its  autocorrelation  function  is  given  by 

<|>yy(i)  =  E{  y(t+x)y(t) } 

=  E{  [x(t+x)  -  n(t+x)]  [x(t)  -  n(t)] } 

=  E{  x(t+x)x(t) }  +  E{  n(t+x)n(t) } 

=  +  <?>rm(x)  (4.4-11) 

if  the  x(t)  and  n(t)  are  independent. 

The  PSD  of  y(t)  is  then  given  by 

oo 

<&yy(f)  =  J  t^xxfa)  +  $nnCt)]  G  ^  dx 


-  ^xx(f>  +  (4.4-12) 

As  a  result  of  Equation  (4.4-12),  PSDs  which  are  calculated  from  experimental 
data  will  show  a  flattening  to  zero  at  the  higher  frequencies  where 
quantization  noise  becomes  dominant  unless  a  low-pass  filter  is  used  to 
remove  it. 

In  the  experimental  data  used  in  Chapter  6,  quantization  noise  shows 
up  with  a  +2  log-log  slope.  This  is  because  the  PSD  was  produced  using  the 
difference  of  successive  samples  that  were  corrupted  by  quantization  noise. 
Recall  that  Equation  (4.4-1)  shows  that  differentiation  of  the  signal  of  interest 
increases  the  log-log  PSD  slope  by  two,  and  that  the  differencing  operation  is 
essentially  differentiation. 
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4.5  Ito  Stochastic  Integral 

If  g(t,(o)  is  a  random  function,  and  (3(t,co)  is  a  Weiner  process  such  that 
g(s,co)  is  independent  of  p(t,co)  for  s  <  t,  then  the  Ito  stochastic  integral  of  g  with 
respect  to  d{5  for  given  sample  paths  is  defined  to  be  [18] 

b  n-l 

f  g(t,to)  d|3(t,co)  =  lim  J  g(t;,co)  [(3(t1+1,co)  -  pfeco)]  (4.5-1) 

a  mesh— >0  j=i 

where  a  =  t|  <  t2  <  ...  <  tn  =  b  is  a  partition  of  the  interval  [a,b]  with 
mesh  =  max  (tj+j-tj).  In  this  definition,  it  is  imprecise  in  what  sense  the  limit 
is  taken.  In  Reference  [18]  it  is  taken  in  the  sense  of  limit  in  the  mean  (l.i.m.) 
for  the  mean  square  integral  calculus.  The  completely  rigorous  definition 
utilizes  the  machinery  of  Lebesgue  integration  with  limits  in  probability  [30]. 
Note  that  the  differential  notation  d(5  does  not  involve  taking  a  derivative, 
and  has  a  rigorous  meaning  in  the  definition  of  the  stochastic  integral. 

The  Stratonovich  stochastic  integral  has  the  function  g  evaluated  at  the 
midpoint  of  each  sub-interval  in  the  partition,  instead  of  the  left  end  point 
used  in  the  Ito  definition.  Formal  properties  of  calculus  result  for,  e.g., 
changes  of  integration  variable.  However  the  Ito  definition  is  preferred,  even 
though  formal  calculus  formulas  are  lost,  because  it  is  defined  on  a  larger 
class  of  functions  and  good  probability  properties  result,  including  the  fact 
that  the  Ito  stochastic  integral  is  a  martingale.  [2] 

The  Ito  and  Stratonovich  stochastic  integrals  are  the  same  for  a  non- 
random  function  g(t)  and  thus  both  may  be  evaluated  using  the  formal  rules 
of  calculus  as  if  p(t)  were  continuously  differentiable  [18].  Weiner  had  defined 
the  stochastic  integral  in  this  case  [10].  The  Ito  and  Stratonovich  integrals  give 
different  results  for  a  random  function  g(t,co),  in  particular  different  expected 
values  over  the  ensemble  of  all  sample  paths  [2]. 
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4.6  Stochastic  Differential  Equations 

The  stochastic  differential  equation  for  a  vector  state  x  and  vector 
Weiner  process  £  is  written  in  terms  of  differentials  as 

dx  =  F(x,t)  dt  +  G(x,t)  d£(t)  (4.6-1) 

This  notation  stands  for  the  Ito  stochastic  integral  equation 

b  b 

x(b)  -  x(a)  =  J  F(x,t)  dt  +  J  G(x,t)  d£(t)  (4.6-2) 

a  a 

If  the  functions  F  and  G  satisfy  Lipschitz  and  other  conditions,  and  if 
the  random  variable  x(a)  with  E[x(a)2]  <  «>  is  independent  of  £(t)  for  t  >  a,  then 
there  is  a  unique  Markov  stochastic  process  solution  x(t)  of  Equation  (4.6-2)  in 
the  mean  square  sense  such  that  x(t)  -  x(a)  is  independent  of  £(x)  for  x  >  t  [18]. 

This  result  depends  on  the  stationary  independent  increment  property 
of  the  scalar  process  (3(t)  with  E{  (P(t2)  -  p(ti)]2 }  =  a  1 12  -  ti  I ,  where  a  is  the 
standard  deviation  of  the  increments.  The  stochastic  integral  cannot  be 
defined  with  fractional  Brownian  motion  Ph  (see  Chapter  5)  replacing  p,  since 
its  increments  are  not  independent,  even  though  they  are  stationary,  and 
E{  [PH(t2)  -  pH(tl)]2  }  =  O  1 12  -  tl  I  2H. 


4.7  Kolmogorov  Fokker-Planck  Partial  Differential  Equations  and 
Derivation  of  the  Kalman  Filter 

The  propagation  of  the  probability  density  of  the  initial  condition  x(a,co) 
of  Equation  (4.6-2)  into  the  probability  density  of  the  solution  x(t,(o)  can  be 
shown  to  satisfy  a  parabolic  partial  differential  equation  called  the 
Kolmogorov  Fokker-Planck  equation  [18].  If  the  state  equation  (4.6-1)  or 
(4.6-2)  is  linear  and  the  initial  condition  Gaussian,  then  the  rigorous  formulas 
for  the  propagation  of  the  resulting  Gaussian  probabilty  density  lead  to 
equations  for  the  propagation  of  the  Gaussian  mean  and  covariance,  and  then 
to  Kalman  filter  formulas  for  the  Bayes  estimation  of  the  mean  and  its 
covariance  conditioned  on  observations  [18]. 
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4.8  Exponentially  Correlated  Noise 

Consider  the  stochastic  differential  equation 

dx  =  -ci  x(t)  dt  +  C1C2  d(3(t)  (4.8-1) 

As  noted  in  Section  4.5,  because  the  coefficients  of  d(5(t)  are  nonrandom,  this 
equation  may  be  evaluated  using  the  formal  rules  of  calculus.  Thus  the 
equation  reduces  to 

2 

dx  C2Cl 

~iY  =  -Cl  x(t)  +  C1C2  w(t)  ,  x(0)  ~  N(0,-^-)  (4.8-2) 

so  that  [18] 

t 

x(t)  =  x(0)  +  ciC2  J  e*ci(t's)  w(s)  ds  (4.8-3) 

0 


where  w(t)  is  a  zero  mean,  white,  Gaussian  forcing  function.  The  variance  of 
the  initial  condition  is  chosen  for  convenience  in  deriving  the 
autocorrelation  function.  Because  x(0)  and  w(t)  have  mean  zero,  x(t)  also  has 
mean  zero.  In  this  thesis,  ci  will  be  referred  to  as  the  inverse  or  reciprocal 
time  constant  and  C2  as  the  scaling  parameter. 


The  autocorrelation  function  for  x(t)  assuming  ti  >  t2  is  [18] 
$XX  (ti,t2)  =  E{  x(ti)x(t2) } 


tl 

x(o)  c-01*1  +  cic2  J 

0 


g-Cl(tl-s)  dg 


X 


*2 

xlC^C01*2  +  ciC2  J  G"01^2"^  w(^)  dE,  ' 
0  J  . 


tl  t2 

=  C2(ci/2)e'ci{ti+t2)  +  Cft  J  J  e-ci(t!-s)  e-ci(t2-^)  5(s^)  ds 

0  0 
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t2 

=  4  (ci/2)  +  44  j  e'ci(t2-^> 

0 

=  c^ci/^e^'^  ,  ti  >  t2  (4.8-4) 

The  fact  that  the  correlation  function  for  x(t)  is  an  exponential  has  lead  to  its 
being  called  exponentially  correlated  noise. 

Because  exponentially  correlated  noise  is  stationary,  its  PSD  may  be 
calculated  directly  as  the  Fourier  transform  of  its  autocorrelation  function: 

oo 

o>xx(f)  =  \  4  (ci/2)  e'cikl  e-27tift  dt 


J_  (ClC2)2 
2tt  f2  +  (ci/27t)2 


(4.8-5) 


where  i  =  ti  -  t2.  This  function  will  have  a  log-log  slope  of  zero  for 
frequencies  below  (ci/2k),  near  this  frequency  it  will  begin  to  transition  to  a  -2 
log-log  slope.  Because  exponentially  correlated  noise  has  a  PSD  which  shows 
more  energy  in  some  frequencies  than  others,  it  is  often  referred  to  as  colored 
noise. 


4.9  Simulation  of  Markov  Noise  Processes 

The  Markov  noise  processes  that  were  described  in  the  preceding 
sections  may  all  be  simulated  using  a  digital  computer.  Computer  generated 
sample  paths  will  be  useful  for  evaluating  the  performance  of  the  algorithms 
presented  in  Chapters  6  and  7.  Subroutines  that  produce  independent,  zero 
mean,  unit  variance,  normally  distributed  random  variables,  and  others  that 
return  independent,  uniformly  distributed  random  variables  are  commonly 
available  in  software  libraries.  These  subroutines  make  use  of  random 
number  generation  algorithms. 
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White  noise  is  a  sequence  of  normally  distributed  random  variables, 
while  random  walk  and  exponentially  correlated  noise  are  simple  functions 
of  a  white  noise  sequence.  Quantization  noise  may  be  simulated  using  a 
random  number  generator  with  a  uniform  distribution  or  by  truncating  data 
to  a  given  precision. 

A  zero  mean  white  sequence  is  formed  using  the  following  formula: 

y(ti)  =  cw(ti)  (4.9-1) 

where  0  is  the  desired  standard  deviation  of  the  sequence  and  w(tj)  is  a  zero 
mean,  unit  variance,  Gaussian  random  variable  from  a  random  number 
generator. 

A  random  walk  sample  path  may  be  simulated  by 

y(h)  =  y(ti-i)  +  b  At1/2  w(tj)  (4.9-2) 

where  b  is  the  random  walk  standard  deviation  parameter  and  At  is  the 
desired  time  step. 

Exponentially  correlated  noise  is  generated  using  the  discretized  form 
of  Equation  (4.8-2): 

y(h)  =  e~ClAt  y(ti-i)  +  C1C2  At1/2  w(ti)  (4.9-3) 

This  process  may  also  be  simulated  using  a  first  order  approximation  to 
Equation  (4.9-3): 

y(tj)  =  y(ti-i)-ciy(tj-i)At  +  ciC2At1/2w(ti)  (4.9-4) 
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Fractional  Brownian  Motion  as  an  Example 
of  a  Non-Markov  Stochastic  Process 


5.1  Motivation  for  Defining  Fractional  Brownian  Motion 

Fractional  Brownian  motion  (fBm)  is  a  generalization  of  the  more 
familiar  Brownian  motion  (Weiner)  process  that  was  defined  in  Section  4.2. 
The  definition  of  fBm  was  motivated  by  the  fact  that  there  are  many  signals 
which  occur  in  physical  systems  that  have  PSDs  with  log-log  slopes  between  0 
and  -2  over  a  very  wide  range  of  frequencies.  Such  processes  cannot  be 
modeled  by  a  combination  of  a  reasonable  number  of  Markov  processes. 

A  PSD  which  is  proportional  to  fP  over  all  frequencies  for  -2  <  p  <  0  is 
indicative  of  a  long  term  dependence  in  the  data  [26].  This  means  that  the 
value  of  the  process  at  the  current  time  is  highly  dependent  upon  all  past 
values  of  the  process.  This  is  why  Markov  processes  cannot  model  systems  of 
this  type,  because  the  future  value  of  a  Markov  process  is  only  dependent 
upon  its  current  value. 

Some  examples  of  systems  which  exhibit  a  power  spectral  density 
proportional  to  fP  are  semiconductors  [36],  loudness  and  pitch  fluctuations  in 
music  and  speech  [43],  seismic  reflections  [42],  sunspot  activity  [27],  and  the 
classic  example  of  Nile  river  flooding  [27].  While  the  physical  processes  of 
these  systems  are  different,  fBm  provides  a  common  mathematical  model 
which  does  characterize  them  in  a  useful  way- 
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5.2  Definition  of  Fractional  Brownian  Motion 

The  definition  of  fBm  'vas  proposed  by  Mandlebrot  and  van  Ness  as  a 
modification  of  an  earlier  definition  by  Barnes  and  Allen  [4].  An  fBm  process 
is  defined  by  the  following  equations  [26]: 

Ph(0,co)  =  bG 

f  0 

pH(t,0))-  PhCO/o)  =  f(H!+  5)  j  |  [(t-s)H_1/2  _  (_s)H-l/2]  dp(s,CO) 

l-oo 

t 

+  J  (t-s)H_1  /2  dp(s,co)  •  (5.2-1) 

0 

In  this  definition,  P(t,co)  is  a  unit  variance  Brownian  motion  or  Weiner 
process  at  time  t  with  probability  space  co,  dp(t,co)  is  its  differential  for  the  Ito 
stochastic  integral,  and  T(«)  is  the  gamma  function.  The  parameter  H  is  the 
fBm  parameter  which  will  define  the  slope  of  the  fBm  PSD.  As  Section  5.6.2 
will  show,  the  log-log  slope  of  the  PSD  will  be  equal  to  -(1+2H).  The 
definition  is  valid  for  0  <  H  <  1.  In  this  thesis,  it  will  be  assumed  that  bQ  =  0. 
The  fact  that  the  integral  in  Equation  (5.2-1)  is  evaluated  from  minus  infinity 
to  the  current  time  makes  fBm  a  non-Markov  process,  except  in  the  case 
where  H  =  |  where  Pi  /2  is  ordinary  Brownian  motion: 

t 

Pl/2(hw)  =  J  dP(s,io)  (5.2-2) 

0 

Because  fBm  is  based  upon  Brownian  motion,  it  is  a  zero  mean  Gaussian 
process. 

Equation  (5.2-1)  is  actually  the  fractional  integral  [32]  of  the  white  noise 
process  dp(t,co)  which  would  more  commonly  be  written  as  [26] 


68 


Chapter  5:  fBm  as  an  Example  of  a  Non-Markov  Stochastic  Process 


pH(t2/<»)  -  pH(tl/«) 


1 

r(H+.5) 


f  *2 


J  [(t2-s)H-l/2]dp(S/0)) 


tl 


-  |  (tl-s)H-l/2dP(s,C0) 


(5.2-3) 


except  that  the  individual  integrals  in  Equation  (5.2-3)  are  divergent,  in 
contrast  with  the  combined  integrals  in  Equation  (5.2-1). 


5.3  Self-similarity  of  fBm 

A  useful  property  of  fBm  is  that  its  increments  are  self-similar.  Self¬ 
similarity  means  that 

p[Pn(t+aAt)  -  Ph(01  =  p[AtHPn(a)]  (5.3-1) 

where  p(«)  indicates  a  probability  density  [26].  Note  that  this  equation  also 
implies  that  the  fBm  increments  are  stationary,  though  they  are  only 
independent  when  H  =  ^  ■ 


Without  loss  of  generality.  Equation  (5.3-1)  can  be  shown  to  be  true  for 
bG  =  0  and  t  =  0.  Using  Equation  (5.2-1)  and  dropping  the  explicit  dependence 
on  the  probability  space  co  for  simplicity 

f  0 

Pn(aAt)  =  A  J  [(aAt-s)H-l/2-(.s)H-l/2]dp(s) 


aAt 

+  J  (aAt-s)H'1//2  dP(s)  • 
0 


(5.3-2) 


s 

The  change  of  variable  s'  =  ^  leads  to 
dp(s')  =  At1/2  dp(s)  (5.3-3) 
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Pn(aAt)  -  AtHf(H+5) 


0 

J  t<-  s')H-l/2  .  (_s')H-l/2]  dp(s') 

.-oo 


a 

+  J  (a-s')^1/2  dp(s')  • 
0 


=  AtH3n(a) 


(5.3-4) 


5.4  Autocorrelation  of  fBm 

In  order  to  find  the  autocorrelation  function  of  fBm  it  is  first  useful  to 
find  the  variance  of  an  fBm  increment.  By  self-similarity 

pCPnCt+At)  -  Ph(0]  =  p[AtHPn(l)]  (5.4-1) 

Thus  in  the  case  where  the  Brownian  motion  process  P(t)  of  Equation  (5.2-1) 
has  unit  variance,  the  variance  of  an  fBm  increment  is  given  by 

E{[pH(t+At)-pH(t)]2}  =  E{  [AtHpH(l)l2 } 

=  At2H  E{pH(l)2} 

=  At2H  VH  (5.4-2) 

where 

VH  =  E{Ph(D2}  (5.4-3) 

The  parameter  Vh  is  only  a  function  of  H,  so  it  would  seem  that  H  is  the  only 
parameter  required  to  characterize  fBm.  However,  it  is  possible  to  create  a 
more  "noisy"  sample  path  by  multiplying  the  sample  path  of  Equation  (5.2-1) 
by  a  positive  constant  A.  As  can  be  seen  by  examining  Equation  (5.4-2)  this 
would  have  the  effect  of  scaling  the  variance  of  an  increment  by  A2.  Thus  a 
new  parameter  will  be  introduced  to  account  for  possible  scaling  of  the  fBm 
process: 

°H  =  A2yH  (5-4-4) 
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This  parameter  is  introduced  because  it  is  common  practice  ([4],  [24],  [26])  to 
consider  Vh  to  describe  the  unsealed  process.  In  this  thesis,  the  parameters  H 
and  an  will  be  used  to  completely  characterize  an  fBm  process. 

Assuming  that  Ph(0)  =  0/  so  that  E{pH(0)PH(ti)}  =  0,  the  autocorrelation 
function  is  now  given  by 

<t>pp(ti,t2)  =  E{pn(ti)PH(t2)} 

=  E{  -pH(0)pH(tl)  -  PH(0)PH(t2)  +  pH(tl)PH(t2>  } 

=  \  E{  Ph(0)2  -  2Pn(0)pH(ti)  +  pH(ti)2 
+  Ph(0)2  -  2pH(0)pH(t2)  +  pH(t2)2 

‘  pH(tl)2  +  2pH<ti)pH(t2)  *  pH(t2)2  ) 

=  \  E{  [pH(tl)  -  Ph(0)]2  +  [pH(t2>  -  Ph(0)]2  -  [pH(t2>  -  pH(tl)]2  } 
=  \  [  ltil2HaH+  1 12  1 2H  -  1 12  -  ti  I  CTh  ] 


=  \  (£[  lti|2H+  lt2l2H-  1 12  -  t!  I  ] 


(5.4-5) 


When  H  =  2  / 

2 

<t>pp(ti,t2)  =  aH  min(ti,t2) 

which  is  the  autocorrelation  function  of  ordinary  Brownian  motion. 


Note  also  that  <[>pp  is  a  function  of  the  absolute  times  ti  and  t2.  This 
makes  fractional  Brownian  motion  a  nonstationary  process.  It  is  also  useful 
to  note  that  because  fBm  is  a  zero  mean  process  its  covariance  is  equal  to  its 
autocorrelation. 
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5.5  Autocorrelation  of  the  Increments 

It  will  be  useful  to  know  the  autocorrelation  function  of  the 
increments  of  an  fBm  process.  The  increments  themselves  are  often  referred 
to  as  discrete  fractional  Gaussian  noise  as  a  generalization  of  the  Gaussian 
noise  increments  which  comprise  ordinary  Brownian  motion. 

Let  the  fBm  increment  X(tk)  at  time  tk  =  kAt  be  defined  to  be 

X(tk)  =  pH(tk+l)  -  Pti(tk)  (5.5-1) 

then  the  correlation  between  two  increments  is  given  by 

^xx^k+n/hc)  =  E{  X(tk+n)X(tk)  } 

=  E{  [pH(tk+n+l)  "  pH(tk+n)l  [pH(tk+l)  -  PH(tk)]  ) 

=  E{  pH(tk  +n+  l)pH(tk+l)  -  pH(tk+n+l)PH(tk) 

-  pH(tk+n)pH(tk+l)  +  pH(tk+n)pH(tk)  ) 

=  -  \  E{  IpffOk+n+l)  -  pH(tk+l)32  )  +  \  E{  [Pnltk+n+l)  *  pH(tk)]2  ) 

+  \  E{  [PH(tk+n)  -  PH(tk+l)]2  }  -  \  E{  iPH(tk+n)  -  PH(tk)l2  } 

=  - 1  (Ini  At)2H  +  |  ( I  n+1 1  At)2H 

+ 1  ( I  n-1 1  At)2H  1  (Ini  At)2H  by  Equation  (5.4-2) 

=  \  At2Ha^  [  I  n+1 1 2H  -  2 1 n  1 2H  +  ln-ll2H]  (5.5-2) 

Note  that  (j)Xx(tk  +  n,tk)  =  <t>xx(n)  so  that  the  increments  are  correlation 
stationary.  Also,  because  the  increments  are  zero  mean,  the  correlation 
function  is  equal  to  the  covariance  of  the  increments  just  as  in  the  case  of  the 
fBm  process  itself. 
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5.6  fBm  Derivative  Process 

It  is  possible  to  define  a  derivative  process  for  fBm  as  follows  [26]: 

PH,8(t)  =  l  [{3h(1+S)  -  Ph(0]  (5.6-1) 

In  the  limit  as  8  approaches  zero,  this  process  does  not  exist  because  fBm  is 
not  differentiable  [26].  However,  this  process  is  an  approximation  to  the 
derivative  of  fBm  which  will  allow  the  calculation  of  the  autocorrelation 
function  and  PSD  of  fractional  Gaussian  noise. 


5.6.1  Autocorrelation  of  fBm  Derivative  Process 

The  autocorrelation  function  of  Equation  (5.6-1)  is  given  by 

<t>pp(t2/tl)  =  ^  E{  [pH(t2+S)  -  pH(t2)][PH(tl+5)  -  PhOiXI  1 

=  \oH  8-2  [  (X  +  5)2H  -2t2H  +  I X -  8 1 2H  ]  (5.6.1-1) 

where  x  =  t2  -  tj  and  t2  >  ti.  Holding  x  fixed  and  taking  the  limit  as  8 
approaches  zero  leaves 

<[>pp(x)  =  o2h  H(2H-1)  I  x  1 2H-1  (5.6.1-2) 

after  applying  L'Hopital's  Rule  twice. 


5.6.2  Power  Spectral  Density  of  fBm  Derivative  Process 

Because  the  autocorrelation  of  real  data  is  an  even  function,  the 
Fourier  transform  of  any  real-valued  autocorrelation  function  may  be  given 
in  terms  of  Fourier  cosine  coefficients  instead  of  the  complex  exponential 
coefficients  used  throughout  Chapter  3.  Using  this  method,  the  PSD  of  the 
derivative  process  is 
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4>pp(f)  =  \  8"2  J  [  (x  +  8)2H  -2x2H  +  |  x  -  8 1 2H  ]  cos(27ifx)  dx  (5. 6.2-1) 

— oo 

which,  under  the  assumption  that  8f  is  small,  reduces  to  [26] 

<frpp(f)  =  2  aH  82H-1  <D*  (f)  (5.62-2) 

where 

<J>*p(f)  =  KH(2^8f)1'2H  (5.62-3) 

kH  2  ^  [cos  jc(H-1)]-1  (5.62-4) 

Thus  the  fBm  derivative  process  itself  may  be  used  to  model  a  system 
with  log-log  PSD  slope  equal  to  1-2H.  Also,  by  Equation  (4.4-1),  fractional 
Brownian  motion  models  systems  with  log-log  PSD  slope  equal  to  -(1+2H). 


5.7  Summary  of  fBm  Properties 

Fractional  Brownian  motion  is  a  nonstationary  Gaussian  process  that 
has  stationary,  zero  mean,  Gaussian  increments.  In  contrast  with  standard 
Brownian  motion,  the  increments  of  fBm  are  not  independent  of  each  other. 
This  makes  fBm  a  non-Markov  process.  The  increments  of  fBm  are  also  self¬ 
similar  (Equation  (5.3-1)).  Signals  with  stationary,  self-similar  increments  are 
also  known  as  fractals  [25]. 

Some  other  properties  of  fBm  that  are  shown  in  Reference  [26]  are  that 
it  is  mean  square  continuous,  almost  all  of  its  sample  paths  are  continuous, 
and  it  is  almost  surely  not  differentiable.  Note  that  these  are  all  properties  of 
Brownian  motion,  which  is  the  special  case  of  fBm  with  H  =  ^ . 

Reference  [26]  also  shows  that  fBm  is  unique  in  that  (1)  if  a  stochastic 
process  has  stationary  and  self-similar  increments  with  parameter  H  and  is 
mean  square  continuous,  then  0  <  H  <  1,  and  (2)  a  Gaussian  stochastic  process 
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with  self-similar  and  stationary  increments  is  fractional  Brownian  motion 
times  a  constant  plus  an  offset. 


5.8  Simulation  of  fBm 

Just  as  in  the  case  of  Markov  stochastic  processes,  it  is  useful  to  be  able 
to  simulate  fractional  Brownian  motion  using  a  digital  computer.  Simulated 
sample  paths  are  used  to  test  the  estimators  developed  in  the  Chapter  7.  Two 
methods  of  simulating  fBm  are  presented  below.  Simulations  were  run 
using  both  methods,  and  each  produced  data  with  the  expected  log-log  PSD 
slopes.  For  the  analysis  and  plots  presented  in  this  thesis,  the  method 
described  in  Section  5.8.2  was  used. 


5.8.1  Discrete  Approximation  to  fBm  Integral 


It  is  possible  to  simulate  fBm  with  a  brute  force  discrete  approximation 
to  the  fBm  integral  given  by  Equation  (5.2-1).  This  simulation  is 
implemented  as  an  approximation  to  the  Stratonovich  stochastic  integral, 
which  is  the  same  as  the  Ito  integral  in  this  case,  in  the  following  form: 


pH(tk)  -  r(H+.5) 


[tk  -  (ti+f)]H-l/2  -  (-ti+f  )H-l/2f  [p(ti>  -  p(tu)] 


k-1  A.  I 

+  X  [tk  -  (ti+y)]H-l/2  [p(ti>  -  p(tM)]  (5.8.1-1) 

i=0  J 


where  At  is  the  desired  time  step  and  tj  =  iAt.  In  this  formula,  N  must  be  very 
large  with  respect  to  k  in  order  to  achieve  sufficient  accuracy.  The  term 
p(tj)  -  fKtj-i)  is  a  random  walk  increment  which  is  simulated  as  At1/,2w(tj), 
where  w(tj)  is  a  zero  mean,  unit  variance,  Gaussian  random  variable 
produced  by  a  random  number  generator.  Multiplying  Equation  (5.8. 1-1)  by  a 
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.  2 
positive  constant  A  will  create  an  fBm  sample  path  with  oH  equal  to  A2Vh/ 

where  Vh  is  defined  by  Equation  (5.4-3). 

This  method  is  computationally  slow  because  of  the  necessity  to  make 
N  large.  Experience  with  the  method  showed  that  using  N  equal  to  64k  was 
sufficient  to  produce  good  results.  However,  this  requires  the  generation  of 
65  times  as  many  Gaussian  random  variables  as  there  will  be  points  in  the 
resulting  fBm  sample  path. 


5.8.2  Correlated  Increments  Method 

An  alternate  way  to  simulate  fBm  is  to  take  advantage  of  the  known 
correlations  between  the  increments  Pn(tj)  -  PH(tj-i).  It  is  possible  to  generate 
N  independent  Gaussian  increments  and  then  to  transform  them  so  that  they 
have  the  desired  correlations.  This  is  done  by  creating  a  matrix  which 
contains  the  desired  correlations  between  the  various  increments,  i.e.  the  ijth 
element  of  the  matrix  is  given  by 

Cij  =  <}>xx(n)  (5.8.2-1) 

where  (|>xx(n)  is  as  defined  in  Equation  (5.5-2),  and  n  =  I  i  -  j  I . 

The  correlation  matrix  C  may  then  be  factored  into  the  product  of  a 
lower  triangular  symmetric  matrix  and  its  transpose  using  a  Cholesky 
decomposition  [40]: 

C  =  LLt  (5.8.2-2) 

This  is  possible  because  the  correlation  matrix  is  positive  definite  [24].  The 
positive  definiteness  of  the  correlation  matrix  also  guarantees  the 
invertibility  of  L  [40].  It  is  now  possible  to  define  a  vector  of  random  variables 
y  such  that 

y  =  L/*x  (5. 8.2-3) 

If  x  is  zero  mean,  the  variance  of  y  is  given  by 
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E{  y  yT }  =  E{  L_1x  xT  L-T} 

=  l-illtl-t 

=  I  (5.8.2-4) 

Thus  the  transformed  variables  are  uncorrelated  with  unit  variance. 

The  fBm  simulation  procedure  consists  of  generating  the  correlation 
matrix  C  based  upon  the  desired  fBm  H  and  an  parameters,  factoring  C  into  L 
and  LT,  generating  a  vector  y  of  N  independent  Gaussian  random  variables 
with  unit  variance,  and  performing  the  transformation 

x  =  Ly  (5.8.2-5) 

to  arrive  at  N  fBm  increments  with  the  desired  correlations.  The  increments 
may  then  be  summed  to  produce  the  fBm  sample  path. 

This  method,  which  was  taken  from  Reference  [24],  has  the 
disadvantage  of  requiring  the  Cholesky  decomposition  of  the  N  x  N  matrix  C. 
This  requires  an  extremely  large  number  of  operations  as  N  becomes  large. 
Even  so,  this  method  is  still  much  quicker  than  the  method  of  Section  5.8.1, 
and  the  Cholesky  decomposition  routine  itself  is  commonly  available  in 
software  libraries. 
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Chapter  6 


Application  of  Maximum  Likelihood 
Estimation  to  System  Identification 


6.1  Estimation  of  System  Dynamic  and  Markov  Noise  Parameters 

Maximum  likelihood  system  identification  is  a  method  which 
combines  a  linear  optimal  filter  with  a  maximum  likelihood  estimator  to 
determine  unknown  dynamic  and  noise  parameters  in  a  given  system  model. 
An  extended  Kalman  filter  rather  than  this  technique  is  commonly  used  to 
determine  system  parameters. 

Maximum  likelihood  system  identification  has  been  presented  under 
various  designations  by  different  authors  [15],  [29],  [31],  [37],  [39].  It  will  be 
called  Full  Information  Maximum  Likelihood  Optimal  Filtering  (FIMLOF)  in 
this  thesis,  a  designation  commonly  used  at  Draper  Laboratory  if  not  in  the 
rest  of  the  community  [16],  [41].  "Full  Information"  refers  to  the  fact  that 
stochastic  as  well  as  dynamic  parameters  can  be  estimated,  "Maximum 
Likelihood"  refers  to  the  use  of  a  Fisher  estimator  to  determine  the 
parameters,  and  "Optimal  Filtering"  refers  to  the  presence  of  a  Kalman  filter 
in  the  algorithm. 

The  basic  concept  behind  FIMLOF  is  to  fix  the  unknown  parameters, 
propagate  the  states  with  a  Kalman  filter  that  is  based  upon  those  parameters, 
build  a  likelihood  function  with  the  information  obtained  from  the  filter, 
adjust  the  parameters  to  maximize  the  likelihood  function  with  respect  to  the 
unknown  parameters,  and  repeat  the  process  with  the  new  parameter  values 
until  the  parameter  estimates  converge. 
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Alternating  between  the  maximum  likelihood  parameter  estimate 
iteration  and  the  Kalman  filter  state  propagation  is  not  the  same  as 
alternatingly  holding  some  parameters  fixed  and  solving  for  others  in  pure 
maximum  likelihood  or  least  squares  estimation.  The  latter  procedure  is 
definitely  incorrect.  The  former  procedure  works  very  well  in  practice, 
because  the  state  initial  conditions  are  among  the  estimated  parameters  and 
the  Kalman  filter  serves  to  account  for  noise  in  the  dynamics  that  maximum 
likelihood  estimation  could  not  handle  alone. 


6.2  System  Model 

The  system  of  interest  will  be  assumed  to  be  a  linear,  time  invariant 
system  driven  by  white  noise  with  white  measurement  noise.  The  Ito 
stochastic  differential  equation  model  for  the  state  vector  x  is 

dx  =  A'x  dt  +  B'u(t)  dt  +  L'  (6.2-1) 

where  u(t)  is  a  deterministic  input  vector  and  fi.(t)  is  a  Weiner  process. 
Integration  of  the  equation  and  discretization  yields  a  discrete  time,  state  space 
model: 


x(tk+i)  =  Ax(tk)  +Bu(tk)  +  L£(tk) 

with  measurements 


z(tk+i)  =  Cx(tk+i)  +  fi(tk+i) 


(6.2-2) 


(6.2-3) 
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This  model  employs  the  following  definitions: 


x(tk) 

e  9xn  : 

state  vector  at  time  tk 

u(tk) 

e  : 

deterministic  input  vector  at  time  tk 

£(tk) 

e  SKP  : 

white  plant  noise  vector  at  time  tk 

z(tk) 

e  9\r  : 

measurement  vector  at  time  tk 

ectk) 

e  9ir  : 

white  measurement  noise  vector  at  time  tk 

tk 

time  index  (k  =  0, 1,  2, ...) 

A 

e  9inxn  : 

state  transition  matrix 

B 

£  9inxn  : 

system  deterministic  input  matrix 

C 

£  TvriXn  : 

system  output  matrix 

L 

£  91nxn  : 

system  plant  noise  input  matrix 

The  initial  state,  x(to),  is  a  Gaussian  random  variable  with  mean 
E{x(to)}  =  x0  and  covariance  E{  [x.(to)  -  ^o][x.(to)  -  }  =  S0.  The  plant 

and  measurement  noise  processes  are  zero  mean,  discrete,  white  noises  with 
covariance  E{^(tj)^(tk)}  =  H5jk  and  E{£(tj)fl(tk) }  =  ©8jk  respectively.  This 
method  will  also  assume  that  all  of  the  measurement  variables  are  corrupted 
by  the  measurement  noise  and  that  the  input  vector  u(tk)  is  known. 


6.3  Kalman  Filter  Equations 

The  Kalman  filter  is  the  most  common  metnod  of  estimating  the  states 
of  a  linear  dynamic  system.  This  filter  is  optimal  in  the  sense  that  it 
minimizes  the  mean  square  error  of  the  state  estimates.  The  well  known 
equations  for  the  filter  are  summarized  below  [14]. 

Kto)  =  I0 
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x(to)  =  Xo 

x(tk+i  I  tk)  =  A£(tk  I  tk)  +  Bu(tk)  (6.3-3) 

S(tk+1 1  tk)  =  AX(tk  I  tk)AT  +  LS(tk)LT  (6.3-4) 

K(tk+1)  =  E(tk+1 1  tk)CT[CI(tk+1 1  tk)CT  +  ©(tk^)]'1  (6.3-5) 

x(tk+i  I  tk+i)  =  x(tk+i  I  tk)  +  K(tk+!)[z(tk+!)  -  Cx(tk+!  I  tk)]  (6.3-6) 

I(tk+1 1  tk+i)  =  [I  -  K(tk+i)C]  I(tk+1 1  tk)  (6.3-7) 


In  the  formulation  above,  the  notation  £(tk+i  I  tk)  means  the  covariance  of  the 
estimates  at  time  tk+i  given  all  measurements  through  time  tk.  The  same 
notation  applies  to  the  estimate  of  the  states,  x(tk+i  I  tk). 

The  Kalman  filter  is  characterized  by  a  prediction  cycle  (Equations  6.3-3 
and  6.3-4)  and  an  update  cycle  (Equations  6.3-6  and  6.3-7).  The  pre-update 
residual,  or  innovation,  is  defined  as 

r(tk+i)  =  z(tk+i)-z(tk+l  I  tk)  (6.3-8) 

where 

z(tk+i  I  tk)  =  Cx(tk+1  I  tk)  (6.3-9) 

The  mean  of  this  residual  is 

E{r(tk+i)l  =  E{z(tk+i)  -  z(tk+i)} 

=  E{Cx(tk+i) +  fi(tk+i)  -  Cx(tk+i  I  tk)} 

=  Cx(tk+i 1  ^  "  Cx(tk+i  I  tk) 

=  0  (6.3-10) 

The  covariance  S(tk+i)  of  the  pre-update  residual  is 
S(tkvi)=  E{r(tk+i)  l(tk+i)Tl 

=  El[Cx(tk+i)  +  fi(tk+i)  -  Cx(tk+i  !  tk)][Cx(tk+i)  +fi(tk+l)  -  Cx(tk+i  I  tk)]T} 
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=  CX(tk+1ltk)CT  +  0(tk+1)  (6.3-11) 

In  Section  6.4,  it  will  be  useful  to  know  the  conditional  probability 
density  of  the  measurement  z(tk)  at  time  tk  given  all  measurements 
zk-l  =  [z(tk-i),  z(tk-2)r  •••/  z(to)]  up  to  tk-i,  where  z(to)  contains  the  initial 
condition  information.  Using  Equations  (6.3-10)  and  (6.3-11)  along  with  the 
assumption  that  all  of  the  noise  processes  are  Gaussian,  this  density  may  be 
written  as 

p(z(tk)lzk-l)  =  (2jc)'r/2det[S(tk)]‘1'/2  e  ’  k(tk)T  s(tk)"1I(tk)]/2  (6  3.12) 
where  det(S)  is  the  determinant  of  the  matrix  S. 


6.4  The  Likelihood  Function 

As  stated  in  Chapter  2,  the  goal  of  maximum  likelihood  estimation  is 
to  maximize  the  joint  probability  density  of  the  measurements  with  respect  to 
the  unknown  parameters.  In  this  way,  the  parameter  values  are  chosen  that 
maximize  the  probability  that  the  measurements  that  did  occur  would  have 
occurred. 

The  notation  p(zN;<&)  indicates  the  joint  probability  density  function  of 
all  of  the  measurements  through  time  tN-  The  vector  jx  is  composed  of  the 
unknown  parameters.  The  expression  p(zN;a)  is  actually  a  family  of  densities 
that  are  indexed  by  the  different  values  of  the  parameters  [37].  The  probability 
density  may  be  expanded  using  conditional  densities 

p(zN;&)  =  p(z(tN)  I  zN'1;^) . . .  p(z(ti)  I  z(to);&)  p(z(to);fi)  (6.4-1) 

Since  p(z(tf<j)  lzN_1;&)  as  given  by  Equation  (6.3-12)  is  a  function  of  r(tk)  and 
S(tk)  only,  the  likelihood  function  may  be  computed  using  quantities 
calculated  by  the  Kalman  filter. 

It  will  simplify  the  computations  if  In[p(zN;fl)]  is  maximized  in  place  of 
the  density  itself.  This  eliminates  the  exponential  and  converts  the  products 


83 


MAXIMUM  LIKELIHOOD  ESTIMATION  OF  FRACTIONAL  BROWNIAN  MOTION 

into  sums.  This  simplification  is  made  possible  by  the  fact  that  the  natural 
logarithm  increases  monotonically.  By  Equation  (6.4-1), 

N 

ln[p(zN;&)]  =  £  ln[p(z(tk)  1 2k_1;a)]  (6.4-2) 

k=0 

where  p(z(0)  I  Z'Tg)  =  p(z(0);cc). 

Using  Equation  (6.3-12) 

ln[p(z(tk)  I  z14-1;^)]  =  -  \  ln(2it)  -  \  ln{det[S(tk;a)]}  -  \  rdu^^dufiO^^tiofi)  (6.4-3) 

Because  the  leading  term  is  a  constant,  it  will  not  affect  the  parameters  that 
maximize  the  function  and  it  can  be  ignored.  Finally,  changing  the  sign  of 
the  remaining  terms  produces  the  negative  log-likelihood  function  without 
the  constant  term: 

N 

C(zN;fl)  s  £(z(tk)  Iz^cc)  (6.4-4) 

k=0 


where 

£(z(tk)  I  zk-T;fi)  s  |  ln{det[S(tk;<20]}  +  \  r  ( tk;a)TS( t^ix)'1  r  ( t^H)  (6.4-5) 

This  negative  log-likelihood  function  may  be  assembled  recursively  as 
the  Kalman  filter  processes  the  measurements.  The  maximum  likelihood 
estimate  ql  of  the  parameters  a  will  be  the  parameter  values  which  minimize 
Equation  (6.4-4). 


6.5  Minimizing  the  Negative  Log-Likelihood  Function 

There  are  several  numerical  methods  available  to  minimize  the 
negative  log-likelihood  function.  Newton-Raphson  iteration  is  chosen 
because  it  offers  relatively  fast  convergence  and  because  it  is  possible  to 
analytically  determine  the  gradient  and  an  approximation  to  the  Hessian  of 
£(zN;&)- 
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Let  a.  =  [ai, ...,  aq]T  be  the  maximum  likelihood  estimates  of  the 
parameters.  For  these  values  the  negative  log-likelihood  function  is 
minimized 

£(zN;&)  =  minimum  (6.5-1) 

This  means  that  the  gradient  of  the  function  evaluated  using  these  values  for 
the  parameters  is  zero 

.  =0  (6.5-2) 

doq  Q,=£l 

If  tto  =  [alo,...,aqo]T  are  the  first  guesses  at  the  values  of  the  parameters  and  if 
Equation  (6.5-2)  is  expanded  in  a  Taylor  series  about  these  guesses,  the  result  is 


dC(zN;cc)  I 

doq  a=f£ 


t  £  MS  |  Actj ,  i  =  1 
doq  p\  doqdaj  ' 


q  (6.5-3) 


where  Aaj  =  aj  -  Oj0. 

As  in  Section  2.6,  this  result  leads  to  the  following  set  of  linear 
equations  for  the  adjustments  Actj  to  the  first  guesses  aj0  towards  the 
maximum  likelihood  estimates  aj: 

q 

Ajj  Actj  =  Bj ,  i  =  1, ...,  q  (6.5-4) 

j=l 

where  by  Equations  (6.4-4)  and  (6.5-3) 


N 


B;  =  - 


k=0 


aC(z(k)lz^;ffl) 

doq 


'a=fl0 


,  i  =  1, ...,  q 


(6.5-5) 
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N 


9^(z(k)  I  zk-1;^) 


k=0 


daida 


) 


'£=£o 


,  i,j  =  1, ...,  q  (6.5-6) 


In  the  equations  above,  B  and  A  are  the  negative  gradient  and  the  Hessian  of 
the  negative  log-likelihood  function  respectively.  The  maximum  likelihood 
estimate  of  the  parameters  is  found  by  solving  Equation  (6.5-4)  iteratively 
until  Aa  approaches  zero. 


6.6  Derivatives  of  the  Likelihood  Function 

In  order  to  use  the  preceding  minimization  method  the  first  partial 
derivatives  of  the  negative  log-likelihood  function  with  respect  to  the 
unknown  parameters  are  calculated  analytically.  These  derivatives  are  the 
gradient  and  they  are  used  to  form  an  approximation  to  the  Hessian.  The 
gradient  follows  directly  from  Equations  (6.4-4)  and  (6.4-5).  It  is 

3C(zN;<i)  v  dC(z(tk)lzk-,;a)  . 

-a5r"k4-0 — 55; — ’  l  =  1- -•'*  (6M) 


where 


3C(z(tk)lzk-!;a) 

3aj 


=  r(tk;a)TS(tk;fl)'1 


glOjog) 

Ba, 


-  ~  r(tk;fl)TS(ti0-ffi)'1^^^S(tk;(i)'1r(tk;fic) 
2  0a, 


+  —  tr 


S(tk;i20 


.10S(tuflt) 


0a, 


(6.6-2) 


Appendix  B  contains  the  derivation  of  the  last  term  which  is  the  derivative  of 
the  natural  logarithm  of  the  determinant. 

If  the  definition  of  r  and  the  formula  for  S,  Equation  (6.3-11),  are 
substituted  into  Equation  (6.6-2),  the  gradient  may  be  calculated  analytically 
without  much  difficulty. 
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In  order  to  avoid  calculating  the  second  partial  derivatives  of  the 
negative  log-likelihood  function,  the  Fisher  information  matrix  is  used  as  an 
approximation  to  the  Hessian.  Recall  from  Section  2.3  that  the  Fisher 
information  matrix  is  equal  to  the  expected  value  of  the  Hessian.  Assuming 
the  Hessian  is  approximately  equal  to  its  expected  value,  Equation  (2.3-2)  may 
be  used  to  express  Equation  (6.5-6)  as 

A'' 2  k?0  Ei  as;  ^  I  (6'6‘3) 


This  Fisher  information  matrix  approximation  is  valid  when  the 
deterministic  input  is  very  large  with  respect  to  the  stochastic  input  to  the 
system  or  when  the  observation  interval  [0,  t^l  is  much  greater  than  the 


correlation  times  of 


3r(tk;ec)  ,  3r(tk;&) 


9a, 


and 


9ctj 


[37]. 


Equation  (6.6-3)  may  be  manipulated  to  arrive  at  the  more  useful  form 
given  by  Equation  (6.6-4).  The  derivation  of  this  equation  in  the  special  case 
of  a  scalar  measurement  is  presented  in  Section  6.8.  The  derivation  of  the 
general  form  is  contained  in  Appendix  C. 


Aij 


k=0 


9r(tk;ffl)T 

0Oi  8(Xj 


sitter1 


+ 


1  9S(tk;a) 

2  8ai 


stor1 


9S(tk;g) 

9aj 


Sltj^ot)'1 


(6.6-4) 


The  Fisher  information  approximation  allows  the  use  of  a  Newton- 
Raphson  iteration  which  does  not  require  the  difficult  task  of  calculating  the 
second  partial  derivatives  of  the  negative  log-likelihood  function.  This 
method  is  known  as  Gauss-Newton  iteration. 

There  is  evidence  that  the  Fisher  information  approximation  is  better 
than  using  the  exact  Hessian.  Namely,  the  second  partial  derivatives  were 
coded  for  the  model  in  Section  6.8.  The  maximum  likelihood  iteration  did 
not  converge  with  the  exact  Hessian,  but  it  did  converge  with  the  Fisher 
information  approximation  to  the  Hessian.  The  computer  code  for  the  first 
and  second  partial  derivatives  of  the  negative  log-likelihood  function  was 
checked  by  the  difference  method  (Appendix  D.4),  so  this  result  is  probably 
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not  due  to  a  coding  error.  Statistical  variations  apparently  make  the  exact 
Hessian  less  tractable  than  the  expected  value  of  the  Hessian,  at  least  when 
the  first  guess  for  the  parameters  is  not  close  to  the  true  values.  For  example, 
the  Hessian  may  have  some  negative  eigenvalues,  while  the  Fisher 
information  approximation  is  guaranteed  to  be  positive  definite.  See  also 
References  [15],  [41].  In  this  thesis,  all  minimizations  were  performed  using  a 
pure  Gauss-Newton  iteration. 


6.7  Implementation 

The  FIMLOF  software  algorithm  consists  of  implementing  the  Kalman 
filter  described  by  Equations  (6.3-3)  through  (6.3-7).  Using  a  first  guess  for  the 
parameters,  this  filter  calculates  the  residual  and  error  covariance  at  each 
measurement.  These  values,  along  with  their  partial  derivatives  are  then 
incorporated  into  the  negative  log-likelihood  function,  its  gradient,  and  the 
approximation  to  its  Hessian.  Once  all  the  measurements  have  been 
processed,  a  single  Gauss-Newton  iteration  is  performed  to  update  the 
parameter  estimates.  The  process  is  repeated  with  a  new  Kalman  filter  which 
is  based  upon  the  improved  parameter  estimates,  and  iterations  continue 
until  the  parameters  converge  satisfactorily. 

FORTRAN  software  was  written  to  implement  the  FIMLOF  algorithm. 
Much  of  the  code  is  for  systems  of  general  dimensions,  but  some  subroutines 
were  specialized  to  scalar  rather  than  vector  observables  and  to  the  specific 
two  state  model  of  Section  6.8.  The  software  uses  the  square  roots  of  the 
diagonal  elements  of  the  Fisher  information  matrix  (calculated  during  the 
maximum  likelihood  iteration)  as  a  measure  of  the  uncertainty  of  the 
parameter  estimates,  by  the  Cramer-Rao  lower  bound  discussed  in  Section  2.4. 

It  can  be  shown  that  for  a  time  invariant,  linear  Gaussian  model  with 
stationary  noise  processes,  the  FIMLOF  estimate  will  display  the  properties 
which  make  classical  maximum  likelihood  estimates  attractive  [37].  That  is, 
the  FIMLOF  estimate  is  asymptotically  consistent,  unbiased,  normally 
distributed,  and  efficient.  As  stated  in  Chapter  2,  this  means  that  the  Cramer- 
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Rao  lower  bound  is  attained  in  the  limit  of  a  large  number  of  observations. 
Thus  the  accuracy  of  the  parameter  estimates  can  be  readily  evaluated. 


6.8  Application  of  FIMLOF  to  an  Approximation  to  fBm 
6.8.1  Example  of  Experimental  Data  with  a  -1  log-log  PSD  Slope 

As  a  nonMarkov  process,  fractional  Brownian  motion  is  not  suitable 
for  estimation  using  a  Kalman  filter,  since  the  derivation  of  the  Kalman  filter 
assumes  Markov  plant  noise  [18].  This  means  that  FIMLOF  cannot  be  applied 
to  estimate  fBm  parameters.  However,  it  is  possible  to  produce  a  system 
which  has  a  power  spectral  density  with  a  -1  slope  over  a  finite  frequency 
range  by  summing  several  Markov  noise  processes  [21].  An  example  of  this 
procedure  using  experimental  data  is  now  presented. 

Figure  6.8-1  shows  the  output  of  an  accelerometer  which  is  under  a 
constant  -1  g  acceleration.  The  plot  was  produced  by  sampling  the  output  at  1 
Hz  and  then  averaging  every  10  data  points.  The  data  covers  a  time  span  of 
5.7  hours.  Figure  6.8-2  is  a  PSD  of  the  1  Hz  data  using  32,768  (215)  points. 
Below  0.05  Hz,  this  plot  has  the  -1  log-log  slope  which  is  characteristic  of  fBm. 
Above  this  frequency,  the  plot  transitions  through  a  zero  slope  to  a  +2  log-log 
slope.  A  slope  of  zero  is  indicative  of  the  presence  of  white  noise  and  a 
+2  slope  is  an  attribute  of  quantization  noise.  Figure  6.8-3  is  a  PSD  of  the  10 
second  averaged  data.  It  shows  that  the  averaging  removes  some  of  the 
quantization  noise  while  adding  some  energy  to  the  lower  frequencies  due  to 
aliasing.  The  estimator  will  use  this  averaged  data  in  order  to  decrease  the 
number  of  measurements  while  still  covering  a  long  time  span. 
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801 - * - ■ - ■ - * - 1 
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Time  (sec)  xKM 

Figure  6.8-1  Accelerometer  Output  Averaging  10  sec  Intervals 
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Figure  6.8-2  PSD  of  Accelerometer  Output  Using  32,768  Points 
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Figure  6.8-3  PSD  of  Accelerometer  Output  Using  2048  Points  of 

10  sec  Averaged  Data 


6.8.2  Second  Order  Scalar  Observable  FIMLOF  Model 

The  fact  that  this  problem  involves  a  scalar  measurement  simplifies 
the  calculations  and  makes  it  a  good  example  to  more  clearly  demonstrate  the 
FIMLOF  process.  The  fractional  Brownian  motion  contained  in  the  signal 
will  be  approximated  by  the  combination  of  a  trend,  random  walk,  and 
exponentially  correlated  noise  which  are  all  Markov  noise  processes.  The 
system  model  will  also  include  white  measurement  noise. 

Let  (xi,  X2)  be  states  with  xi  equal  to  a  trend  plus  random  walk  and  X2 
equal  to  an  exponentially  correlated  noise.  Let  the  deterministic  input 
function  u(tk)  be  a  unit  step,  and  assume  that  At  =  tk  -  tk-i  is  the  constant  time 
spacing  of  the  measurements  in  seconds.  Also  make  the  following 
definitions: 
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a  :  trend  parameter 

b  :  random  walk  standard  deviation 

q  :  inverse  of  the  exponentially  correlated  noise  time  constant 
C2  :  exponentially  correlated  noise  scaling  parameter 

v  :  white  noise  standard  deviation 


Then  following  the  notation  of  Section  6.2,  this  model  is  written  as 


x(tk+l)  — 


1  0 
0  e-c'AtJ 


x(tk)  + 


aAt 

0 


u(tk)  + 


b  0 
0  C]C2  J 


z(tk+l)  =  [11]  x(tk+l)  +  fi(tk+l) 
H(tk)  =  At 


1  0 
0  1 


€Ktk)  =  [v2] 


£(tk)  (6.8.2-1) 

(6.8.2-2) 

(6.8.2-3) 

(6.8.2-4) 


The  state  equation  (6.8.2-1)  for  X2  is  the  discrete  form  of  Equation  (4.8-1). 

Based  upon  the  above  formulation,  the  parameters  to  be  estimated  are 
a,  b,  ci,  C2,  and  v.  Four  more  parameters  are  needed  because  the  Kalman  filter 
requires  initial  conditions  on  the  state  estimates  and  their  covariance. 


xo  = 


£o  = 


XlO 
X20  J 

s2  0 

0  s^J 


(6. 8.2-5) 


(6. 8.2-6) 


The  initial  covariance  matrix  is  assumed  diagonal  because  the  noise  states  are 
independent  in  the  model.  The  initial  conditions  are  estimated  along  with  a, 
b,  ci,  C2,  and  v,  but  si  and  S2  are  assumed  to  have  known  values.  The 
derivation  of  the  estimator  will  include  the  formulas  involving  si  and  S2  for 
completeness. 


The  Kalman  filter  equations  for  this  system  are  well  defined  and  follow 
directly  from  those  listed  in  Section  6.3.  In  order  to  perform  the 
minimization  of  the  negative  log-likelihood  function  using  the  gradient  and 
approximation  to  the  Hessian  which  are  given  by  Equations  (6.6-2)  and  (6.6-4) 
the  partial  derivatives  of  r(tk+i;fi),  which  is  a  scalar,  and  S(tk+i;et)  are  required. 
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Expressions  for  these  partial  derivatives  are  formed  using  the  Kalman  filter 
equations  which  are  specific  to  the  system. 

Recall  from  Equation  (6.3-11)  that  in  the  scalar  measurement  case  the 
error  covariance  of  the  residuals  is 

S(tk+1)  =  E{  r(tk+1)2  } 


=  ^lidk+l  •  tk)  +  Zi20k+l  I  tk) 

+  ^l^k+l  I  Ik)  +  ^22^k+\  I  tk)  +  v2 


(6.82-7) 


where  the  explicit  dependence  of  r  and  Sonfi  has  been  dropped  for  clarity. 
Then  the  Kalman  filter  gain  matrix  is 

K«ktl)  =  S(tk+,  I  tk)  CT  [  C  Htk+1  I  lk>  CT  +  v2  ]-l 


1  +  £12 

SCht+l)  _  ^21  +  X22 

The  covariance  matrix  of  the  updated  state  is 


£(tk+l  dic+i) 


[I  -  K(tk+1)C]X(tk+1ltk) 


(6.82-8) 


(6.82-9) 


-  X(tk+1 1  tk) 

«tk*l  I  tk)  CT  [  C  S(tk+1 1  tk>  CT  +  v2  H  C  S(tUi  I  tk) 


and  Kalman  filter  update  Equation  (6.3-6)  is 


*j(tk+i  I  tk+i)  —  x(tk+i  I  tk)  +  K(tk+i)r(tk+i) 


(6.8.2-10) 


At  the  initial  time  tQ,  the  only  non-zero  partial  derivatives  are  by 
Equations  (6.82-5)  and  (6.82-6) 


9x,(to) 


9x2(to) 


(6.8.2-11) 


5lii(to) 


=  2  s,, 


0l22(to) 


=  2s2 


(6.8.2-12) 


The  only  non-zero  partial  derivative  of  the  state  transition  matrix  is 
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0  -Ate‘c'At 


(6.8.2-13) 


The  partial  derivatives  of  the  state  propagated  from  time  t^  to  tR  +  i 

with  respect  to  parameters,  initial  conditions,  or  initial  state  standard 
deviations  oq  are  by  Equation  (.6.3-3) 

3x(tk+i  I  tk)  3 A  ~  A  3x(ijtk)  3B  „„„ 

- 5 - =  3 —  x(tk  I  tk)  +  A — - -  +  —  (6.8.2-14; 

3  a;  3a;  3a*  3ai 

where  the  only  non-zero  partial  derivative  of  B  is 


3B  _  At 
da  ~  „ 


(6.8.2-15) 


The  partial  derivatives  of  the  state  covariance  propagated  from  time  tk 
to  tk+i  with  respect  to  parameters,  initial  conditions,  or  initial  state  standard 
deviations  cq  are  by  Equation(6.3-4) 


3£(tk+i  1  tk) 
3cq 


3  A  3AT 

-xmvat  +  akw— 

.  a«tkitk)  3Q 

+  A  — - AT  +  — 

c/CXj  d(X| 


(6.8.2-16) 


where 


Q  =  L  5  LT  =  L  LTAt 


(6.8.2-17) 


so  that 


3Q  3L  _  3LT 

r  =  —  LTAt  +  L  -  At 

C/Ctj  dot  j  dotj 


(6.8.2-18) 


with  the  only  non-zero  partial  derivatives  of  L  being 


3L  r  i  o  1  3L_  0  0  _3L_  0  0 

3b  "  [  o  0  ’  3q  “  0  C2  '  3c2  ~~  L  0  ci 


(6.8.2-19) 


Also  the  partial  derivative  of  the  measurement  noise  covariance  is 
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3v2  0  if  Oti  *  V 

3oq  2v  if  a*  =  v 


(6.8.2-20) 


At  this  point,  recalling  the  definitions  of  the  residual  and  its 
covariance,  their  partial  derivatives  may  be  written  as 


^  ^^k+l)  _  dx(tk+l  1 tk) 


3cq 


3ai 


(6.8.2-21) 


3  S(tk+1)  3ln(tk+1 1  tjj)  2_3Xi2(tk+1 1  tk] 


3tXi 


3cq 


3oq 


S^22(tk+l  I  tk)  „  3v 
+  -  +  2v3^ 


(6.8.2-22) 


The  partial  derivatives  of  the  Kalman  filter  gain  matrix  are  by 
Equation  (6.8.2-8) 


3K(tk+1) 

3oii 


_J _ 


aEll(tk+l  1  tk)  +  ^12(tk+l  I  tk) 

3oi  3ai 

^l(tk+l  I  tk)  +  ^^22(tk+l  I  tk) 

3cti  3ccj 


1 _ 

S(tk+i)2 


dS(tk+i)  Sn(tk+1 1  tk)  +  Zi2(tk+1 1  tk) 
50Ci  Z2l(tk+1 1  tk)  +  X22(tk+1  i  tk) 


(6.8-23) 


The  partial  derivatives  of  the  updated  state  at  time  tk+j  given  the 
measurements  up  to  time  tk+j  are  by  Equation  (6.8.2-10) 

d£(tk+i  I  tkfi)  =  g2<(tk+i  I  tk)  +  K(tk4])g£(^i),  +  3^)^)  (6.8-24) 
3ai  3(Xi  3ai  3aj 


The  partial  derivatives  of  the  updated  state  covariance  at  time  tk+1  given  the 
measurements  up  to  time  tk+i  are  by  Equation  (6.8.2-9) 
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d£(tk+l  i  tk+l)  _  3S(tk+i  I  tiJ  tr(^  \  r'  ^-^k+l  I  tk) 

r  —  r  K.V.tjc+1  j  — - 

dOi  dCXi  dCXi 


3K(tk+l) 


c  £(tk+i  I  tk) 


(6.8.2-25) 


In  this  simplified  case  the  components  of  the  gradient  of  the  negative 
log-likelihood  function  (6.6-2)  may  now  be  written  as 

d£(z(tk+1)  I  zk;a)  _  1  dS(tk+1)  r(tk+1)  dr(tk+1) 

daj  2S(tk+i)  3cq  S(tk+i)  3oq 

-2S(ck+1)2  acq  (6'a2'26) 

and  the  Hessian  (6.5-6)  of  the  negative  log-likelihood  function  is 


Aij  -  ^  Aij(tic) 
k=0 


(6.8.2-27) 


with 


.  .  .  d2C(z(tic)lzk-l ;fit) 

Aij(tk)  =  - t—t - 

1  dajdOj 


(6.8.2-28) 


Inserting  Equations  (6.8.2-26)  into  Equation  (6.6-3)  yields  the  Fisher 
information  approximation  to  the  Hessian 


Aij(tic)  =  E{Aij(tk)} 


f  1  dS(tk)  dS(tk) 

^  4  S(tk)2  daj 

r(tk)  r  dS(tk)  dr(tk) 

2  S(tk)2  9aj  daj 

+  dS(tk)  dr(tk)  j 

daj  dai 
r(tk)2  dr(tk)  dr(tk) 

S(tk)2  3ocj  daj 
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r(tk)4  9S(tk)  9S(tk) 

+  4  S(tk)4  0aj  8aj 

r(tk)2  8S(tk)  8S(tk) 

2  S(tk)3  0oq  3aj 

r(tk)3  r  8S(tk)  8r(tk) 
2  S(tk)3  Bcq 
+  8S(tk)  8r(tk)  j 
8aj  8oq 


(6.8.2-29) 


If  the  following  facts  are  applied  to  the  above  equation 

r(tk)  :  zero  mean  Gaussian  distribution  with 

covariance  =  S(tk+1) 
third  moment  =  0 
fourth  moment  =  3S(tk+|)2 


8r(tk) 

8cq 


non-random  function  relative  to  yk  I  yk-i,—,yo 


S(tk)  :  non-random  function  relative  to  yk+1 1  yk/...,yi 


it  reduces  to 


so  that 


E{Aij(tk)} 


1  8S(tk)  8S(tk) 

2  S(tk)2  3oq  3aj 

1  8r(tk)  8r(tk) 
S(tk)  3a- 


(6.8.2-30) 


^  f  1  8S(tk)  8S(tk) 

-  k=o  2  S(tk)2  doq  8otj 

1  8r(tk)  8r(tk) 
S(tk)  8cq  8cq 


(6.8.2-31) 


As  expected,  Equation  (6.8.2-31)  is  the  scalar  version  of  Equation  (6.6-4). 


97 


6.8.3  Application  of  FIMLOF  to  Experimental  Data 


The  FIMLOF  estimation  software  described  in  Section  6.7  for  the  second 
order  scalar  observable  model  of  Section  6.8.2  was  applied  to  the  10  point 
averaged  data  of  Figure  6.8.1.  Two  fits  were  performed  using  different  time 
spans  of  data.  One  fit  covered  30,000  seconds  and  the  other  48,500  seconds. 
The  results  of  these  fits  are  contained  in  Table  6.8-1. 


Table  6.8-1  Results  of  FIMLOF  Fit  to  Accelerometer  Data 


30,000  sec  of  Data 

48,500  sec  of  Data 

Initial  Conditions: 

xi  (units) 

X2  (units) 

0.00  ±  10.0 
-0.158  ±  1.0 

0.10  ±10.0 

0.002  ±1.0 

Trend:  a  (units/sec) 

2.09  x  10'5  ±  2.8  x  10-5 

7.94  x  lO'7  ±  2.7  x  10-5 

Random  Walk  Standard 
Deviation:  b  (units) 

5.10  x  10*3  ±  1.2  x  10-3 

5.89  x  10-3  ±  1.0  x  10-3 

Exp.  Corr.  Noise: 

Inv.  Time  Const.:  ci  (1/sec) 
Scaling  Param.:  C2  (units) 

5.74  x  10-3  +  i.5  x  10-3 
2.70  ±  0.86 

6.59  x  10-3  + 1.4  x  io-3 
2.76  ±  0.54 

White  Noise  Standard 
Deviation:  v  (units) 

0.174  ±  0.003 

0.173  ±  0.002 

RMS  Pre-update 

Residual:  (units) 

0.20 

0.20 

These  numerical  results  present  some  information  about  the  quality  of 
the  fit.  It  is  important  to  note  that  the  addition  of  more  measurements 
decreased  the  uncertainty  of  the  estimates  as  measured  by  the  Cramer-Rao 
lower  bound.  This  lower  bound  should  improve  by  a  factor  of  the  square  root 
of  the  ratio  of  the  first  to  the  second  number  of  measurements.  The 
extremely  low  estimate  of  the  trend  and  the  associated  high  uncertainty 
indicates  that  there  is  no  significant  trend  over  the  time  span  of  the  data. 
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Because  the  goal  was  to  approximate  a  PSD  with  a  -1  log-log  slope,  a 
better  test  of  the  quality  of  the  fit  occurs  in  the  frequency  domain.  Simulated 
sample  paths  were  produced  with  the  techniques  described  in  Section  4.8. 
These  simulations  of  the  system  model  (6.8.2-1)  used  the  parameters  given  by 
the  48,500  second  fit.  Figure  6.8-4  is  an  example  of  one  such  path  and 
Figure  6.8-5  shows  a  PSD  generated  from  the  data  using  2,048  points.  The  PSD 
is  nearly  identical  to  Figure  6.8-3  which  shows  the  frequency  content  of  the 
averaged  data.  This  is  reasonable  because  of  the  measurement  averaging  that 
was  used  to  reduce  the  number  of  data  points. 
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Figure  6.8-4  Simulated  Sample  Path 
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Frequency  (Hz) 

Figure  6.8-5  PSD  of  Simulated  Sample  Path 


This  example  shows  that  FIMLOF  may  be  used  to  identify  a  Markov 
model  which  will  approximate  the  power  spectrum  of  a  fractional  Brownian 
motion  process  over  a  given  frequency  range.  However,  a  PSD  produced 
using  a  longer  time  span  of  data  would  show  flattening  in  the  lower 
frequencies.  Increasing  the  valid  range  beyond  the  3  decades  presented  in  this 
example  requires  that  more  states  be  added  to  the  model.  Reference  [21] 
provides  an  analysis  of  the  number  of  poles  required  to  approximate  a 
-1  slope  over  a  given  number  of  decades. 


6.9  Results  of  Fits  to  Computer  Generated  Sample  Paths 

In  order  to  further  demonstrate  the  accuracy  of  FIMLOF,  a  sample  path 
consisting  of  the  sum  of  a  linear  trend,  random  walk,  exponentially  correlated 
noise,  and  white  noise  was  generated  using  the  method  described  in 
Section  4.8.  This  sample  path  spanned  327.6  seconds  with  a  time  step  of  0.01 
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seconds.  The  estimator  used  every  tenth  point  of  the  sample  path  as  a 
measurement.  The  FIMLOF  estimator  for  this  problem  is  the  same  as  that 
derived  in  Section  6.8.  Table  6.9-1  gives  the  results  of  the  test  case. 


Table  6.9-1  Results  of  FIMLOF  Fit  to  Computer  Generated  Sample  Path 


Simulation  Value 

Estimate 

Initial  Conditions: 

x\  (units) 

0.00 

-1.70  ±1.3 

X2  (units) 

0.00 

1.75  ±  1.7 

Trend:  a  (units/sec) 

1.00 

0.98  ±  0.05  • 

Random  Walk  Standard 
Deviation:  b  (units) 

1.00 

0.9611.0 

Exp.  Corr.  Noise: 

Inv.  Time  Const.:  ci  (1/sec) 

0.64 

1.76  ±0.38 

Scaling  Param.:  C2  (units) 

1.56 

-0.92  ±  0.22 

White  Noise  Standard 

0.10 

0.163  ±  0.031 

Deviation:  v  (units) 

Better  results  would  be  obtained  for  the  parameter  C2by  using  more 
data  or,  if  that  were  not  possible,  the  parameter  should  be  constrained  to  be 
non-negative  and  new  results  computed  for  the  other  parameters.  The 
results  show  that  the  estimator  does  a  good  job  of  estimating  the  trend, 
random  walk,  and  white  noise  parameters  because  there  is  a  great  deal  of 
information  about  these  parameters  present  in  the  measurements. 

The  case  was  simplified  by  eliminating  the  exponentially  correlated 
noise  state  from  the  simulated  sample  path.  The  estimator  produced  the 
results  contained  in  Table  6.9-2.  Note  that  the  estimates  have  all  improved  in 
quality,  with  both  their  uncertainties  and  the  amounts  that  they  differ  from 
the  simulation  values  having  decreased. 
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Table  6.9-2  Results  of  FIMLOF  Fit  to  Simplified  Sample  Path 


Simulation  Value 

Estimate 

Initial  Condition: 

xi  (units) 

0.00 

-0.12  ±  0.34 

Trend:  a  (units/sec) 

1.00 

0.98  ±  0.05 

Random  Walk  Standard 
Deviation:  b  (units) 

1.00 

0.98  ±  0.02 

White  Noise  Standard 
Deviation:  v  (units) 

0.10 

0.14  ±  0.008 
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Application  of  Maximum  Likelihood 
Estimation  to  Fractional  Brownian  Motion 


7.1  Modeling  a  Non-Markov  Process 

Because  fractional  Brownian  motion  is  not  a  Markov  process,  it 
cannot  be  fully  characterized  without  knowledge  of  its  state  for  all  past  time. 
For  this  reason,  fractional  Brownian  motion  cannot  be  incorporated  into  a 
Kalman  filter.  These  filters  propagate  from  one  discrete  time  to  the  next 
storing  only  the  state  at  the  previous  time. 

The  parameters  of  a  non-Markov  process  may  be  estimated  by  using  a 
filter  which  stores  information  from  every  past  measurement.  One  method 
of  doing  this,  proposed  by  Lundahl,  et.  al.  [24],  takes  advantage  of  the  known 
correlations  between  the  increments  of  fBm.  An  estimator  of  this  type  may  be 
formulated  to  use  either  the  measurements  or  the  increments  between  the 
measurements  as  observations.  These  formulations  will  be  identified  as  the 
sum  observable  and  the  increment  observable  respectively. 


7.2  Increment  Observable  Formulation 

If  Pn(tk) ls  an  process  with  PH(0)  =  0,  then  EiP^tk+l^pH^k)}  =  0 
because  the  fractional  Brownian  motion  process  is  Gaussian  with  zero  mean. 
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Equation  (5.5-2)  shows  that  if  the  samples  are  uniformly  spaced  (t^  =  kAt)  the 
autocorrelation  function  for  the  increments  of  the  process  is  [24] 

<j>xx(n)  =  E  {  [pH^k+n+l^H^k+n)]  EPH^k+lJ-pHttk)! } 

=  2  {  E{  [PH^k+n+l)  -  PH^k)]2  ) 

+  E  {  [PH(tk+n)  -  pH^k+l)]2  } 

-  E  {  [pH^k+n+l)  ‘  pH^k+l)!2  I 

-  E  {  [pH(tk+n)  -  pH(fk)I2  }  } 

=  At2H  {  I  n+1 1 2H  -2  I  n  1 2H  +  I  n-1 1 2H  }/2  (7.2-1) 

where  the  parameters  oh  and  H  are  the  fractional  Brownian  motion  standard 
deviation  and  dimension  parameters  which  were  defined  in  Chapter  5. 

Because  the  increments  are  zero  mean  random  variables,  the 
correlation  between  two  increments  is  also  equal  to  their  covariance.  This 
allows  the  increments  to  be  considered  to  be  jointly  Gaussian  random 
variables  with  zero  mean  and  covariance  S  where  the  ijth  element  of  S  is 
determined  according  to 

Sjj  =  <(>xx(  I  i  -  j  I )  (7.2-2) 

The  probability  density  function  of  the  increments  then  becomes 

p(z1,...,zN)  =  (27t)-N/2det(S)'l/2  e  -  (zT  S-1z)/2  (7.2-3) 

where 

Pn(ti)  -  Pn(to) 

z=  i  (7.2-4) 

Ph(In)  -  Pn(tN-i) 

Given  a  set  of  measurements,  maximum  likelihood  estimation  may  be 
used  to  determine  the  parameters  H  and  oh  which  maximize  the  likelihood 
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that  those  measurements  occurred.  The  parameters  enter  the  problem 
through  the  (N  x  N)  covariance  matrix  S. 

The  simple  case  above  may  be  extended  to  include  more  complex 
systems  that  are  modeled  as  the  sum  of  several  stochastic  and/or 
deterministic  processes.  Consider  a  scalar  stochastic  process  y(tk)  which  at 
each  discrete  time  tk  is  equal  to  the  sum  of  a  bias  b(tk),  several  zero  mean 
random  variables  xjftk)  with  stationary  Gaussian  increments,  and  a  stationary 
white  measurement  noise  9(tk)  with  E{82}  =  0: 

y(tk)  =  b(tk)  +  xT(tk)  +  x2(tk)  +  . . .  +  xN(tk)  +  ©(tk)  (7.2-5) 

The  increment  from  y(tk)  to  y(tk+i)  is  equal  to 
Ay(tk+i)  =  y(tk+i)  -  y(tk) 

N 

=  b(tk+1)  -  b(tk)  +  X  [xi(tk+i)  -  Xi(tk)]  +  0(tk+1)  -  0(tk)  (7.2-6) 

i=l 

Because  the  stochastic  processes  are  zero  mean,  the  increment  can  be 
converted  to  a  zero  mean  random  variable  by  subtracting  the  bias  term.  This 
gives 

N 

Ay’(tk+i)  =  X  Axi(tk+i)  +  e(tk-n)  -  0(tk)  (7.2-7) 

i=l 


where 


Axj(tk+i)  =  Xi(tk+i)  -  xj(tk) 

The  correlation  between  two  increments  can  now  be  calculated 
<t>yy(j)  =  E{Ay'(tk+j+i)  Ay'(tk+i)} 


=  E 


‘  N 

X  Axj(tk+j+i)  +  9(tk+j+i)  -  0(tfc+j) 
.i=l 


'  n  r 

X  Axm(tk+i)  +  0(tk+i)  -  0(tk)  • 
_m=l  J . 


(7.2-8) 


(7.2-9) 
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If  each  noise  is  assumed  to  be  independent  of  the  others,  this  expression 
reduces  to 


<t>yy(j)  =  EjX  ^i^tk+j+iMxKtk+i) I  +  E{[e(tk+j+i)-e(tk+j)][0(tk+i)-e(tk)]} 


-e|  s  Axi(tk+j+i)Axi(tk+i)  j  +  28(j)0 - 8( I j I -1)0 


(7.2-10) 


where  8(t)  =  1  at  t  =  0  and  is  zero  elsewhere. 


Equation  (7.2-12)  shows  that  the  autocorrelation  function  of  the 
increments  is  equal  to  the  sum  of  the  autocorrelation  functions  of  the 
increments  of  the  individual  stochastic  processes.  This  makes  it  easy  to  build 
<t>yy(j)  for  a  complicated  process  by  summing  the  correlations  of  simpler 
processes.  Fractional  Brownian  motion  can  be  included  by  simply  adding  in 
its  autocorrelation  which  is  given  by  Equation  (7.2-1).  Also,  because  Ay'(tk)  is 
a  zero  mean  process,  4>yy(j)  is  the  covariance  of  its  increments.  This  means 
that  the  increments  are  again  characterized  by  a  multivariate  Gaussian 
probability  density  function. 

Let  the  observable  column  vector  be 


y(ti)  -  yOo) 


y(tN)  -  y(tN-i)- 


(7.2-11) 


with  mean 


b(tN) 


Then  the  probability  density  of  the  observables  is 


(7.2-12) 


p(z, . zN;a)  =  (2K)-N/2det[S(ffi)]-1/2  e  ■  Kz-Uta*’7  Sto^fe-uta)))/2  <7.2-13) 
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where  &  is  the  (m  x  1)  vector  of  parameters  which  characterize  the  stochastic 
and  deterministic  processes  which  make  up  the  signal  of  interest  and  S(sx)  is 
defined  by  Equation  (7.2-2)  using  the  appropriate  autocorrelation  function.  If 
there  is  a  trend  a  in  the  data,  b(ti)  =  aAt.  A  maximum  likelihood  estimator 
can  be  used  to  determine  the  parameters  &  which  best  match  the  N  data 
points. 

A  disadvantage  of  the  increment  observable  formulation  is  that  it 
cannot  estimate  the  initial  condition  associated  with  a  trend.  This 
shortcoming  motivates  the  development  of  the  sum  observable  formulation. 


7.3  Sum  Observable  Formulation 

Recall  from  Equation  (5.4-5)  that  the  nonstationary  autocorrelation 
function  for  a  fractional  Brownian  motion  process  is 

<t>pp(tj,tk)  =  E  { IWj)  Pnfo) } 

=  Opj  {  ltjl2H+  Itkl2*1-  I  tj  -  tk  1 2H  }/2  (7.3-1) 

This  autocorrelation  function  is  also  the  covariance  of  the  zero  mean  random 
variables  |3H(tj)  and  Pn^k)-  Thus  the  value  of  the  process  at  times  ti, ... ,  tN  is 
a  set  of  jointly  Gaussian  random  variables  with  zero  mean  and  covariance  S 
where 


Sjk  =  $pp(tj,tk)  (7.3-2) 

The  sum  observable  formulation  may  also  be  extended  to  handle 
processes  which  are  the  sum  of  several  deterministic  and/or  stochastic 
processes.  Consider  a  scalar  stochastic  process  y(tk)  which  at  each  discrete  time 
tk  is  equal  to  the  sum  of  a  bias  b(tk)  several  zero  mean  random  variables  with 
Gaussian  increments  xj(tk)  and  a  stationary  white  measurement  noise  0(tk): 

y(tk)  =  b(tk)  +  x^tk)  +  x2(tk)  +  . . .  +  xN(tk)  +  0(tk)  (7.3-3) 


107 


MAXIMUM  LIKELIHOOD  ESTIMATION  OF  FRACTIONAL  BROWNIAN  MOTION 


If  the  stochastic  processes  are  zero  mean,  this  observable  can  be  made  into  a 
zero  mean  random  variable  by  subtracting  off  the  bias  term  leaving 

N 

y'(tk)  =  X  Xi(t^  +  e(tk))  (7.3-4) 

i=l 


The  correlation  between  two  measurements  can  now  be  calculated 


tyyGktk)  =  E{y'(tj)  y'(tk)} 


=  E 


N 


X  xj(tj)  +  0(tj) 

L  1=1 


N 


X  xm(tk)  +  0(tk) 
J  L  m=l 


(7.3-5) 


If  each  noise  is  assumed  to  be  independent  of  the  others  and  E{02}  =  0,  this 
expression  can  be  reduced  to 


<t>yy(tj/tk)  =  E 


N 

X  xi(tj)xi(tk) 
i=l 


+  © 


(7.3-6) 


Equation  (7.3-6)  shows  that  the  autocorrelation  function  of  the 
measurements  is  equal  to  the  sum  of  the  autocorrelation  functions  of  the 
individual  stochastic  processes.  This  makes  it  easy  to  build  <J>yy(tj,tk)  for  a 
complicated  process  by  summing  the  correlations  of  simpler  processes. 
Fractional  Brownian  motion  can  be  included  by  simply  adding  in  its 
autocorrelation  which  is  given  by  Equation  (7.3-1).  Also,  because  y’(tk)  is  a 
zero  mean  process,  <t>yy(tjrtjc)  is  the  covariance  of  the  measurements.  This 
means  that,  just  as  in  the  increment  observable  case,  the  measurements  are 
characterized  by  a  multivariate  Gaussian  probability  density  (7.2-13)  with  S(&) 
being  defined  by  Equation  (7.3-2).  If  there  is  a  trend  in  the  data, 
b(tj)  =  ai  +  a2Atj. 

A  disadvantage  to  using  the  sum  observable  joint  probability  density 
for  maximum  likelihood  estimation  is  that  the  terms  on  the  diagonal  of  the 
covariance  matrix  increase  steadily  from  the  upper  left  to  the  lower  right 
corner.  This  tends  to  make  the  matrix  poorly  conditioned  as  the  number  of 
measurements  becomes  large. 
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7.4  The  Maximum  Likelihood  Estimator 


The  fact  that  both  the  increment  and  sum  observable  formulations 
reduce  to  a  Gaussian  probability  density  function  is  very  convenient.  The 
maximum  likelihood  estimator  for  this  function  has  already  been  presented 
in  Chapter  6.  In  fact,  the  fractional  Brownian  motion  estimator  is  less 
difficult  to  develop  analytically  because  it  is  a  batch  estimator  and  there  is  no 
need  to  assemble  the  likelihood  function  recursively  using  a  Kalman  filter. 
Recall  that  the  classical  maximum  likelihood  estimator  maximizes  the 
probability  density  function  of  the  measurements  with  respect  to  the 
unknown  parameters.  In  each  of  the  two  formulations  developed  in  the 
preceding  sections,  this  probability  density  function  is  of  the  form 

p(z, . zN;a)  =  (2K)-N/2dettS<s)H/2  e  ‘  te-U<S*»T  S(a)-l<z-u(s»]/2  (7.4-1) 

where  the  Zj  are  the  scalar  measurements  and  [ij  their  means  (subtracted  out 
in  computing  the  covariance  S(&)).  Following  the  procedure  of  Section  6.4, 
the  negative  of  the  logarithm  of  this  function  will  be  minimized  in  order  to 
simplify  the  calculations.  The  negative  log-likelihood  function  without  the 
constant  term  is  thus 

Cfefit)  =  \  ln{  det[S(ffl)] }  +  \  [z-p(fl)]TS(fl)'1[z-u(o;)]  (7.4-2) 


The  gradient  and  an  approximation  to  the  Hessian  of  this  function  are 
required  so  that  Newton-Raphson  iteration  may  again  be  used  to  perform  the 
minimization.  These  partial  derivatives  are  identical  to  those  presented  in 
Section  6.6.  The  gradient  vector  is 


3C(z;&) 

3oci 


3u(a)  i  3S(a) 

-  (z-jiCall^’Sffi)-’1  j  (z-jiCQL»TSCQO_1  5a""" 


+  |tr  S(a)- 


,3S(fi)' 


9aj 


,  i  =  1, ...,  m  (7.4-3) 


and  the  Fisher  information  approximation  to  the  Hessian  is 
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32C(z ;g) 

3aj3aj 


=  tr 


3u(fit)  3u(sx)T  l3S(a)  „3S (fit) 

S(a)-i  +  ^  SCii)-1  —  SCa)*1 


L  3ai  3ai 


3ai 


3a, 


(7.4-4) 


i  1,  ...,  in,  j  —  1, ...,  m 


The  maximum  likelihood  estimator  is  implemented  by  fixing  the 
unknown  parameters,  calculating  the  likelihood  function  and  its  partial 
derivatives,  performing  a  Newton-Raphson  update  to  the  parameters,  as 
described  in  Section  6.5,  and  repeating  the  process  until  the  parameters 
converge.  A  drawback  to  this  method  is  the  computational  burden  that 
comes  with  inverting  the  covariance  matrix  S,  which  has  as  many  rows  and 
columns  as  there  are  measurements. 


7.5  Estimation  Results  with  Pure  Fractional  Brownian  Motion 
7.5.1  Increment  Observable  Model 

Given  scalar  measurements  y(tj)  which  are  taken  at  fixed  intervals  of 
At  from  an  fBm  signal,  increment  and  sum  observable  estimators  may  be 
developed  to  estimate  the  parameters  an  and  H.  For  the  increment 
observable  formulation  the  ijth  element  of  the  covariance  matrix  is  given  by 

S,j  =  OHAt2H{  ln+1  l2H-2  lnl2H+  |n-l  l2H}/2,  n=li-jl  (7.5.1-1) 

The  partial  derivatives  of  these  elements  with  respect  to  the  parameters  are 

=  oh  At2H  {  I  n+1 1 2H  -  2  I  n  1 2H  +  I  n-1 1 2H  }  (7.5.1-2) 

3<jh 

3S"  2 

— =  oH  At2H  { In  I  n+1 1  I  n+1 1 2H  -  2  In  I  n  I  I  n  1 2H  +  In  I  n-1 1  I  n-1 1 2H  } 
oH 

+  Oh  (In  At)  At2H  {  I  n+1 1 2H  -  2  I  n  1 2H  +  |  n-1 1 2H  }  (7.5.1-3) 

where  the  negative  logarithm  of  a  zero  argument  is  ignored  because  it  is 
multiplied  by  zero.  The  measurement  vector  is 
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Z  = 


y(ti)  -  y(to) 


Ly(tN)  -  y(tN-i)J 

The  mean  is  =  0  so  that  its  partial  derivatives  are 

djA 


9oh 


=  0 


o 

3H 


7.5.2  Sum  Observable  Formulation 

For  the  sum  observable  formulation,  the  covariance  matrix 

by 

Sij  =  <Jh(  ltil2H+  ltjl2H-  I  ti  -  tj  1 2H  }/2 
with  partial  derivatives 

aH{  lti|2H+  |h|2H.  |  ti  -  |  2H  } 

3oH  J  J 

^  =  Oh  (  In  I  ti  I  I  ti  1 2H  +  In  I  tj  I  I  tj  1 2H  -  In  |  ti  -  tj  I  I  ti  -  tj  1 2H 

The  measurement  vector  is 

'y(ti)" 

z  = 

-y(tN)- 

and  partial  derivatives  of  the  mean  ja  are  zero 


(7.5.1-4) 

(7.5. 1-5) 

(7.5.1-6) 

is  defined 

(7.5.2-1) 

(7.5.2-2J 

-  (7.5.2-3J 

(7.5.2-4) 
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au 

acjH 


0 


3H 


0 


(7.S.2-5) 


(7.52-6) 


7.5.3  Fits  to  Computer  Generated  Sample  Paths 

The  above  estimators  were  implemented  to  determine  the  parameters 
of  a  sample  path  which  was  generated  using  the  technique  described  in 
Section  5.8.2.  The  sample  paths  were  made  up  of  128  points  spaced  one 
second  apart.  The  parameter  values  used  in  the  simulation  are  given  in 
Table  7.5-1  along  with  the  results  of  the  fits.  In  this  table,  the  "±"  value  is  the 
Cramer-Rao  lower  bound  described  in  Section  2.4.  These  results  show  that 
the  estimators  produce  identical  results,  and  that  they  are  capable  of 
determining  the  fBm  parameters  with  accuracy  on  the  order  of  the  Cramer- 
Rao  lower  bound.  This  accuracy  may  be  improved  by  increasing  the  number 
of  measurements.  Because  the  increment  observable  method  is  better  suited 
for  implementation,  the  remaining  examples  of  this  chapter  will  use  that 
formulation. 


Table  7.5-1  Parameter  Estimates  from  Fits  To  Pure  fBm 


Simulation 

Value 

INCREMENT 

SUM 

Standard  Deviation:  ch 

1.0 

0.988  ±  0.07 

0.988  ±  0.07 

Dimension:  H 

0.1 

0.109  ±  0.03 

0.109  ±  0.03 
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7.6  Results  with  Fractional  Brownian  Motion  Plus  a  Linear  Trend 

If  a  linear  trend  aAt  is  summed  with  the  fBm  signal  described  in 
Section  7.5.1,  the  estimator  changes  very  little.  The  increment  observable 
formulation  of  the  ijth  element  of  the  covariance  matrix  is  still  given  by 

Sij  =  o„  At2H{  | n+1 1 2H - 2  lnl2H+  tn-1  |2H  }/2 ,  n=  li-jl  (7.6-1) 
The  partial  derivatives  of  these  elements  with  respect  to  the  parameters  are 

^  =  0  (7.6-2) 

3a 

— 11  =  aHAt2H{  I  n+1 1 2H  -  2  lnl2H  +  ln-ll2H}  (7.6-3) 

3aH 

ac..  7 

— 51  =  aH  At2H  { In  I  n+1 1  I  n+1 1 2H  -  2  In  I  n  I  I  n  1 2H  +  In  I  n-1 1  I  n-1 1 2H } 

3H  H 

+  Oh  (In  At)  At2H  {  I  n+1 1 2H  -  2  lnl2H+  ln-ll2H}  (7.6-4) 

where  the  parameter  a  defines  the  slope  of  the  trend.  The  measurement 
vector  becomes 


f  y(ti)-ydo)  1 


z  = 


Ly(tN)  -  y(tN-i)J 

The  mean  and  its  partial  derivatives  are 


aAt 


U  = 


L  aAt  -I 


(7.6-5) 


(7.6-6) 
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9a 


At 


9jL 

9an 


9U 

9H 


0 


(7.6-7) 


(7.6-8) 

(7.6-9) 


A  similar  estimator  may  be  developed  using  the  sum  observable  formulation. 

Table  7.6-1  contains  results  of  fits  to  a  sample  path  which  consists  of 
fBm  with  an  additive  linear  trend.  In  the  simulation,  the  fBm  noise 
parameters  were  oh  =  0.7  and  H  =  0.4.  A  single  fBm  sample  path  was 
generated  and  then  the  two  cases  were  created  by  adding  different  trends  to 
that  sample  path.  These  results  show  that  the  magnitude  of  the  trend  did 
not  affect  the  estimate  of  the  fBm  parameters.  It  is  also  important  to  note  that 
the  initial  condition  of  the  trend  cannot  be  estimated  using  the  increment 
observable  formulation. 


Table  7.6-1  Parameter  Estimates  from  Fit  to  fBm  Plus  Trend  (an =.7,  H=.4) 


Case: 

a 

oh 

H 

B 

1.486  ±  0.02 

0.914  ±  0.04 

0.301  ±  0.03 

2.  a  =  5.0 

4.986  ±  0.02 

0.914  ±  0.04 

0.301  ±  0.03 

7.7  Results  with  fBm  Plus  White  Measurement  Noise 

Addition  of  white  Gaussian  measurement  noise  with  zero  mean  and 
standard  deviation  am  to  a  fractional  Brownian  motion  signal  forces  a 
modification  to  the  covariance  matrix.  In  the  increment  observable  case,  the 
ijth  element  of  this  covariance  matrix  is  given  by 
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Sij  =  CTpj  At2H {  I n+1 1 2H - 2  ln|2H+  | n-l  1 2H }/2 

+  2o^8(n)  -  a^8(ln  1-1)  ,  n=  li-jl  (7.7-1) 

where  5(n)  =  1  if  n  =  0  and  0  elsewhere.  The  partial  derivatives  of  these 
elements  with  respect  to  the  parameters  are 

3S*’ 

— l±  =  aHAt2H{  ln+l|2H_2  |n|2H+  |n_i|2H}  (7.7-2) 

3gh 

3S-  2 

— =  cH  At2H  { In  I  n+1 1  I  n+1 1 2H .  2  In  I  n  I  I  n  1 2H  +  in  |  n-i  |  I  n-l  1 2H  } 
cJH 

+  CTh  (In  At)  At2H  {  ln+1  |2H .  2  In  |2H  +  In-1  |2H  }  (77.3) 


4am  if  n  =  0 

-2om  if  n  =  ±1  (7.7-4) 

0  else 

Parameters  were  fit  to  three  separate  sample  paths  using  the  estimator 
above.  These  cases  had  fBm  parameters  cth  =  0.7  and  H  =  0.4  and  varying 
levels  of  measurement  noise.  The  resulting  parameter  estimates  are  shown 
in  Table  7.7-1.  For  am  of  0.025  and  0.25,  the  parameter  estimates  converged, 
but  when  om  was  set  to  1.0,  the  measurement  noise  parameter  diverged. 

There  are  two  possible  reasons  for  the  failure  of  the  algorithm.  It  is 
possible  that  with  only  128  measurements,  the  likelihood  function  was  not 
sharply  peaked.  This  can  be  seen  by  examining  the  singular  value 
decomposition  of  the  symmetric  Fisher  information  matrix  which  may  be 
written  as 

Iij  =  v*v[  (7.7-5) 

i=l 
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where  q  is  the  number  of  parameters,  Oi  is  the  ith  singular  value,  and  Vj  is  the 
direction  associated  with  the  ith  singular  value.  The  parameter  updates  are 
then  given  by 


where  B  is  the  negative  gradient  vector.  Because  the  Fisher  information 
matrix  will  have  small  singular  values  corresponding  to  directions  in  which 
there  is  little  information,  the  corrections  Afl  will  be  large  in  these  directions 
[15].  This  would  send  the  corrections  far  past  the  optimal  parameter  estimates 
during  the  update.  If  the  corrections  are  large  enough,  the  parameter 
estimates  are  no  longer  close  to  the  optimal  values  and  the  code  may 
converge  to  an  extraneous  local  minimum. 

It  is  also  possible  that  for  128  measurements,  the  Fisher  information 
matrix  was  not  a  valid  approximation  to  the  Hessian.  Either  problem  may  be 
solved  by  the  incorporation  of  more  measurements  thus  increasing  the 
amount  of  information  used  by  the  estimator  and  improving  the  Fisher 
information  approximation  to  the  Hessian.  Another  fit  was  performed  using 
200  points,  and  the  algorithm  converged  to  the  estimates  listed  in  Table  7.7-1. 

For  comparison,  another  fit  to  the  case  with  am  =  0.25  was  also 
performed.  This  fit  demonstrates  the  fact  that  the  Cramer-Rao  lower  bound 
will  decrease  when  additional  measurements  are  included.  This  does  not 
assure  a  better  solution,  however,  as  seen  from  the  results  in  the  table. 
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Table  7.7-1  Parameter  Estimates  from  Fit  to  fBm 
Plus  Measurement  Noise  (oh  =  .7,  H  =  .4) 


Case: 

H 

1.  cm  =  0.025  (128  pts) 

0.271  ±  0.09 

0.522  ±  0.09 

0.599  ±  0.11 

2.  am  =  0.25  (128  pts) 

0.353  ±  0.16 

0.651  ±  0.17 

0.461  ±  0.13 

3.  am  =  0.25  (200  pts) 

0.402  ±  0.08 

0.571  ±  0.10 

0.556  ±  0.10 

4.  am  =  1.00  (128  pts) 

— 

1 

1 

1 

1 

5.  am  =  1.00  (200  pts) 

0.838  ±0.19 

0.899  ±  0.32 

0.350  ±  0.13 

7.8  Results  with  fBm  Plus  Exponentially  Correlated  Noise 

Let  xi  (tk)  be  a  discrete  fBm  process  with  parameters  oh  and  H,  and  let 
*2(tk)  be  a  discrete  exponentially  correlated  noise  process  with  reciprocal  time 
constant  ci  and  scaling  parameter  C2-  The  autocorrelation  function  for  X2(tk)  is 
then  given  by 

E{x2(tk)x2(tk+n)}  =  \  C1C2  e^i 1  n  1  At  (7.8-1) 

Let  y(tk)  be  equal  to  the  sum  of  the  two  independent  processes  and  let  Ay(tk) 
be  its  increments.  By  Equation  (7.2-11),  the  autocorrelation  function  of  the 
increments  is  given  by 

<t >xx(n)  =  E{Axi(tk)Axi(tk+n)}  +  E{Ax2(tk)Ax2(tk+n)}  (7.8-2) 

While  the  first  term  of  Equation  (7.8-2)  is  the  known  autocorrelation  of  fBm 
increments,  the  second  term  may  be  manipulated  to  get  a  more  convenient 
form. 

<t>xx(n)  =  E{Axi(tk)Axi(tk+n)}  +  E{[x2(tk)-X2(tk-l)][x2(tk+n)-X2(tk+n-l)]} 
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=  E{Axi(tk)Axi(tk+n)l  +  E{x2(tk)x2(tk  +n)}  -  E{x2(t]t)x2(tk+n-l)} 

-  E{x2(tk-l)x2(tic+n)}  +  {X2(tk-l)x2(tk+n-l)}  (7.8-3) 

Using  Equations  (7.2-1)  and  (7.8-1)  this  reduces  to 

<t>xx(n)  =  anAt2H{  I n+1 1 2H - 2  !n|2H  +  |n-l|2H}/2 

+  \  cic^  [  2e<\  I n  I  At  _  e-c-|  I  n-l  I  At .  e-c1 1  n+l  I  At  ]  (7.8-4) 

Because  both  processes  are  unbiased,  the  autocorrelation  function  for 
the  increments  is  equal  to  their  covariance  so  that 

Sij  =  <t>xx(n)  ,  n  =  li-jl  (7.8-5) 

The  partial  derivatives  of  S  with  respect  to  the  parameters  are  given  by 

=  aHAt2H{  ln+l|2H_2  |n|2H+  |n_i|2H}  (7.8-6) 

aaH 

— ^  At2H  { In  I  n+l  I  I  n+l  1 2H  .  2  In  I  n  I  I  n  1 2H  +  jn  I  n-l  I  I  n-l  1 2H  } 

oH 

+  Gh  (In  At)  At2H  {  | n+l  1 2H  _  2  |n|2H+  |n-l|2H}  (7.8-7) 

^1  =  -  [  2e-<q  I  n  I  At  _  e-C]  I  n-l  I  At .  e-C|  I  n+l  I  At  ] 

5ci  22 

- 1  At  C1C2  [  2 1  n  I  e^l 1  n  I  At .  I  n-l  I  e^l 1  n-l  I  At .  I  n+l  I  g-Cj  I  n+l  I  At  ]  (7.8-8) 


3S** 

— -1  —  C|C2  [  2e<l  I  n  I  At  -  g-c^  I  n-l  I  At  _  g-c  j  I  n+l  I  At  ]  (7.8-9) 

0C2 

The  measurement  vector  and  the  partial  derivatives  of  the  mean 
vector  are  are 


z  = 


y(tO  -  y(to) 


y(tN)  -  y(tN-i) 


(7.8-10) 
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3jl 

J  =  ° 

3cth 

(7.8-11) 

3jl 

—  =  0 

3H 

(7.8-12) 

3ji 

*  =  ° 

3ci 

(7.8-13) 

3ji 

3c2  =  ° 

(7.8-14) 

A  fit  to  a  computer  generated  sample  path  with  parameters  an  =  0.7, 
H  =  0.4,  ci  =  0.5,  and  C2  =  0.9,  failed.  The  fit  used  128  measurements  spaced  at 
one  second  intervals.  This  example  demonstrates  the  fact  that  the  addition  of 
a  fourth  parameter  greatly  increases  the  computational  burden  while  also 
requiring  many  more  measurements  to  get  a  reasonable  estimate.  For 
comparison,  using  128  measurements,  one  iteration  of  the  case  involving 
fBm  plus  white  measurement  noise  required  44.3  seconds  of  CPU  time,  while 
a  single  iteration  of  the  case  of  fBm  plus  exponentially  correlated  noise  took 
72.0  seconds  to  complete.  Of  course,  the  CPU  time  required  to  accomplish  an 
iteration  depends  on  the  complexity  of  the  parameter  partial  derivatives  as 
well  as  on  their  number. 

Increasing  the  number  of  measurements  by  even  a  small  amount  also 
greatly  increases  the  time  required  to  perform  an  iteration  because  the 
operations  required  to  invert  the  covariance  matrix  increases  in  proportion  to 
the  number  of  measurements  cubed.  For  example,  in  the  exponentially 
correlated  noise  case,  changing  the  number  of  measurements  from  128  to  200 
resulted  in  a  corresponding  increase  of  CPU  time  from  72.0  to  236.7  seconds. 


7.9  Results  of  Fit  to  Accelerometer  Data 

The  maximum  likelihood  estimator  was  used  to  perform  a  fit  of  the 
fractional  Brownian  motion  model  to  the  accelerometer  data  described  in 
Section  6.8.  In  order  to  decrease  the  number  of  data  points,  while  still 
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capturing  the  low  frequency  behavior  of  the  system,  the  1  Hz  data  was 
averaged  over  intervals  of  150  seconds.  The  fit  was  then  performed  using  128 
of  these  averaged  measurements.  The  averaging  process  had  the  effect  of 
limiting  the  frequency  domain  information  contained  in  the  signal  to  the 
range  between  5.2  x  10*®  Hz  and  3.33  x  10"^  Hz.  The  PSD  of  the  averaged  data 
was  nearly  identical  to  that  shown  in  Figure  6.8-2  over  the  frequency  range  of 
interest.  The  results  of  this  successful  fit  are  contained  in  Table  7.9-1.  A  value 
of  H  =  0.212  is  indicative  of  a  process  with  log-log  PSD  slope  equal  to  -1.424, 
while  the  PSD  slope  of  Figure  6.8-2  actually  displays  a  -1  log-log  PSD  slope. 


Table  7.9-1  Parameter  Estimates  from  Fit  to  Accelerometer  Data 


Parameter: 

Estimate 

Cramer-Rao 
Lower  Bound 

OH 

2.725 

0.72 

H 

0.212 

0.04 
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Review  of  Frequency  Domain  Properties 
of  Convolution  and  Correlation 


Chapter  3  employs  a  frequency  domain  property  of  autocorrelation  to 
determine  an  estimate  of  the  power  spectral  density  of  a  stationary,  ergodic, 
stochastic  process.  Correlation  is  very  similar  to  the  more  well  known 
operation  of  convolution.  This  appendix  reviews  the  frequency  domain 
properties  of  these  operations. 

The  convolution  (x*y)(t)  =  x(t)*y(t)  of  two  time  domain  functions  x(t) 
and  y(t)  with  Fourier  transforms  X(f)  and  Y(f)  is  defined  to  be  [34] 

oo 

x(t)*y(t)  =  J  x(x)  y(t-x)  dx  (A-l) 

-Oo 


oo 

=  J  x(t-x')  y(x')  dx'  =  y(t)*x(t) 

-oo 


where  the  second  line  is  otained  by  a  change  of  integration  variable,  so  that 
convolution  is  a  commutative  operation. 

The  Fourier  transform  of  Equation  (A-l)  is 


J  x(t)*y(t)  e’2nift  dt  =  J 


|  x(x)  y(t-x)  dx  |  e‘27tift  dt  (A-2) 
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where  the  frequency  f  is  given  in  Hz. 

Making  the  change  of  variables  t  =  x  +  z  leads  to 

OO  CO  oo 

J  X(t).y(t)  e-2”'*  dt  =  J  J  x(t)  y(z)  e-2"if(t+z)  dt  dz 

-OO  -oo  -oo 

oo  oo 

=  j  x(t)  e'2niH  dx  J  y(z)  e*2icifz  dz 

-oo  -oo 

=  X(f)  Y(f)  (A-3) 

Thus  the  Fourier  transform  of  a  convolution  is  equal  to  the  product  of 
the  individual  Fourier  transforms  of  the  functions  of  interest.  Equation  (A-3) 
is  known  as  the  convolution  theorem  [34]. 

In  the  case  of  correlation,  the  results  are  slightly  different  from  that  of 
convolution.  The  correlation  of  two  time  domain  functions  x(t)  and  y(t)  is 
defined  to  be  [8] 


<t>xy(x)  =  J  x(t+x)  y(t)  dt 


=  J  x(t+2>  y(t-|)  dt 


(A-4) 


where  the  second  line  is  obtained  by  a  change  of  variable. 

Taking  the  Fourier  transform  of  both  sides  of  Equation  (A-4)  gives 


j  <j>Xy(x)  e'2rtifT  dx  =  J 


J  x(t+x)  y(t)  dt 


»-2nifx 


dx 


(A-5) 


Making  the  change  of  variables  x  =  -t  +  z  leads  to 
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©O  CO  oo 

J  <t>xy(T)  e'27tift  dx  =  |  |  x(z)  y(t)  e'27tif('t+z)  dt  dz 

-CO  '  -CO  -oo 


CO  CO 

=  J  x(z)  e  -2nifz  dz  J  y(t)  e  2nift  dt 

-OO  -CO 

=  X(f)  Y*(f)  (A-6) 

where  the  superscript  *  denotes  complex  conjugate. 

Equation  (A-6)  shows  that  the  frequency  domain  properties  of  a 
correlation  are  slightly  different  from  those  of  a  convolution.  The  Fourier 
transform  of  a  correlation  is  equal  to  the  product  of  the  Fourier  transform  of 
the  first  term  with  the  complex  conjugate  of  the  Fourier  transform  of  the 
second. 
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Appendix  B 


Derivative  of  the  Natural  Logarithm 
of  a  Determinant 


For  an  nxn  matrix  S  the  determinant  is  [1] 

det(S)  =  ^  £i|„.in  Sii1—Snin 

ii-in 

=  X  inSiil”*Sinn 


where  Sjj  denotes  the  ijth  element  of  S  and 

I  0  if  any  ij  =  ik  (j#k) 


£ii...in  =  (+1  if  ii-.-in  is  an  even  permutation  of  l...n 
|  -1  if  ii-.in  is  an  odd  permutation  of  l...n 


(B-l) 

(B-l)’ 


(B-2) 


For  any  two  square  matrices  A  and  B  we  have  [1] 

det(AB)  =  det(A)  det(B)  (B-3) 

The  inverse  of  S  is  [1] 

S'1  =  (matrix  of  cofactors  of  S)T  (B-4) 


where  superscript  T  denotes  transpose  and  the  entries  in  the  matrix  of 
cofactors  of  S  are 

(cofactor  S)ab  =  (-l)a+b  det  (S  with  row  a  and  column  b  deleted)  (B-5) 
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Note  that  if  S  is  symmetric,  then 

S  =  ST  S'1  =  (S-l)T  (B-6) 

Differentiating  Equation  (B-l)  with  respect  to  a  =  some  oq,  we  obtain  [1] 


d  det(S)  _  £je^  g  with  row  i  replaced  by  its  derivative  ) 
i=l 


3a 


n  n  3S  - 

=  (cofactor  S)ij-  11 

i=lj=l 


3a 


(B-7) 


If  S  is  symmetric  this  implies 

3  det(S) 


3a 


r  ,  3S  i 
=  det(S)  trace  [S'1  —  J 
3a 


(B-8) 


where  the  trace  of  a  square  matrix  is  the  sum  of  the  diagonal  terms.  Therefore 
(see  Equation  (6.6-2)) 


3  ln[det(S)] 
3a 


trace  [s_1  ^  ] 
3a 


(B-9) 
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Derivation  of  Equation  (6.6-4) 


This  appendix  fills  in  some  of  the  details  which  were  left  out  of  the 
same  derivation  in  Reference  [37].  To  derive  Equation  (6.6-4)  it  is  necessary  to 
know  the  third  and  fourth  moments  of  a  multivariate  Gaussian  distribution. 
These  moments  are  derived  first.  Let  c  be  a  nonrandom  vector,  let  A  and  B  be 
symmetric,  nonrandom  matrices,  and  let  r  be  a  multivariate,  zero  mean, 
Gaussian  random  variable  with  covariance  matrix  S.  The  third  moment  of  r 
is  given  by 


E  { [cTr]  [  rTAr  ] }  =  q  Ajk  E{  q  rj  rk  }  (C-l) 

i,j,k 

Now 

E{  rj  rj  rk  (  =  1  J ...  J  r;  rj  rk  e  '  £T  S'h-/2  (C-2) 

J  V(27t)ndet(S) 


where  r  has  n  elements.  Let  y  =  Br  be  such  that  its  covariance,  given  by  cov(y), 
is  a  diagonal  matrix  with  ones  on  the  diagonal: 


5jk  -  [cov(y)]jk  -  ^  BjaBkbSab 

a,b 


Since  r  =  Dy  with  D  =  B*1, 


(C-3) 


^jk  _  ^  DjaDka 
a 


(C-4) 
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Now  consider  E{  y,  yj  yk  }  given  by  an  integral  like  Equation  (C-2)  with  S 
diagonal.  If  all  the  i,j,k  are  different,  the  integral  is  zero  (mean  of  a  first  order 
Gaussian).  If  two  indices  are  the  same  and  one  different  the  integral  is  again 
zero  for  the  same  reason  applied  to  the  one  different  index.  Finally,  if  all 
three  indices  are  the  same,  the  integral  is  zero  as  the  third  moment  of  a  first 
order  Gaussian  [18].  Since  r  =  Dy,  E{  q  rj  rk  }  =  0  and  Equation  (C-l)  is  zero. 

The  fourth  moment  of  a  scalar  Gaussian  random  variable  is  given  by 

[18] 


E{yi4}  =  3  cov(yj)2  (C-5) 

This  implies 

/  3  if  i  =  j  =  k  =  m 

E{  yi  yj  yk  Ym }  =  j  1  if  i  =  j  *  k  =  m  (C-6) 

I  0  if  some  index  is  different  from  others 


Substituting  the  known  relation  for  ri  gives 


E{rarbrcrd}  =  E{£  Dai  yj  X  Daj  yj  X  Dak  yk  X  Dam  Ym } 

i  j  k  m 

=  3  X  DaiDbjDcjDdj  +  X  X  E^ai^bi^ck^dk 
i  i  k*i 

+  X  S  E)aiDbkDCjDdk+  X  X  E>aiDbkDckDdi 

i  k*i  i  k?ti 

=  SabScd  +  ^ac^bd  +  ^ad^bc  (C-7) 


Equation  (C-7)  may  then  be  generalized  to 

E[[rTAr][rTB/]}  =  E{  £  Aab  rarb  X  Bcd  rcrd  } 

a,b  c,d 

=  X  AabBcd  Etrarbrcrd} 
a,b,c,d 
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-  ^  Aab^cd  [Sabred  +  ^ac^bd  +  ^ad%c] 

a,b,e,d 

=  -^ab^ab  1  t  ^  ®cd^cd  ]  +  ^  Aa]3BC(jSacSj5Cj 
a,b  c,d  a,b,c,d 


=  [tr(SA)]  [tr(SB)]  +  2  tr[SASB]  (C-8) 

The  fact  that  the  third  moment  is  zero  and  Equation  (C-8)  are  now  used 
to  derive  Equation  (6.6-4)  from  Equation  (6.6-3).  The  explicit  dependence  of  r 
and  S  on  tk  and  &  will  be  left  out  on  the  right  hand  side  for  clarity.  Inserting 
Equation  (6.6-2)  into  the  term  within  the  summation  in  Equation  (6.6-3)  gives 
the  following 


dC(z(tk)  I  zk-];g)  3^(z(tk)  1  z^1;^)  | 
3(Xj 


+ 


+ 


At  this  point,  recall  that  the  covariance  matrix  S  is  a  nonrandom 
function  and  note  that  the  partial  derivative  of  the  residual  r  with  respect  to 
the  unknown  parameters  is  also  a  nonrandom  function.  Using  these  facts 
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along  with  the  moments  derived  in  this  appendix  eliminates  the  4th,  5th,  6th, 
and  7th  terms  of  Equation  (C-9)  and  leaves 


e{ 


3£(z( tk)  I  zkA;g)  3C(z(tk)  I  z k~1;g) 


3aj 


3otj 


l  r  3rT  3r  ^  ,  -j 

uCCj  oOtj 

as 


+  lftt[^S-.]  tr[fs-l] 

4  daj  3ctj 

r  3S  -  3S  .  I  \ 

+  2tr[  —  S-i—  S-lJI 
da,  oOCj 

-Ms-,£]»rs,|]  (C-1W 


Thus 


E  {  32Cfe(tk)'zk'1;a)  }  -  B  [3f  fs-i  ]  +  *  tr  [fS-S-l  f  S-i  ]  (Ml) 
dai3aj  daj  3(Xj  2  3a}  daj 

which  becomes  Equation  (6.6-4)  when  the  summation  and  the  dependence  on 
tk  and  &  are  reapplied: 


Aii  = 


N 

2> 

k=0 


3r(tvja)  3r(tid<2i)T 


doti 


3a; 


S(tida)'1 


+  L  3Sitk;^-  SOms)’1  S(tk ;ffl) 


2  daj 


da; 


(C-12) 
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Numerical  Algorithms 


D.l  FORTRAN  Storage  Convention 

An  mxn  matrix  Z  =  (Zjj)  is  stored  as  a  vector  in  FORTRAN  with  the 
first  (row)  index  being  the  most  rapidly  varying  (the  opposite  of  the 
convention  in  PL/I,  Pascal,  and  other  languages): 

Zjj  =  Z[  j*m+i  ],  1  <  i  <  m,  1  <  j  <  n 

This  FORTRAN  storage  convention  is  taken  advantage  of  in  the 
software  written  to  do  the  data  analyses  discussed  in  this  thesis,  when  part  of 
a  multidimensional  array  W(m,n,k)  is  sent  to  a  subroutine.  If  the  subroutine 
expects  W(m,n),  and  the  jth  mxn  part  of  W  is  to  be  sent  to  the  subroutine,  the 
calling  program  sends  the  address  of  W(l,l,j)  to  the  subroutine.  Some 
computer  languages  would  not  allow  a  mismatching  of  dimensions  between 
a  calling  and  called  subroutine,  but  FORTRAN  does,  and  advantage  is  taken 
of  the  fact. 


D.2  Symmetric  Matrix  Manipulation 

An  nxn  symmetric  matrix  S  =  (Sjj)  with  Sjj  =  Sjj  is  stored  in  lower 
diagonal  form  as  a  vector  in  the  FORTRAN  software.  Thus,  the  FORTRAN 
vector  is 

Sjj  =  S[  (j*(j-l))+i  ] ,  1  <  i  <  j  <  n 


131 


MAXIMUM  LIKELIHOOD  ESTIMATION  OF  FRACTIONAL  BROWNIAN  MOTION 


Use  of  lower  diagonal  form  saves  about  a  factor  of  2  in  computation  time  and 
storage,  which  can  be  important  when  dealing  with  very  large  matrices. 

There  are  subroutines  to  multiply  non-symmetric  and  symmetric 
matrices,  with  appropriate  handling  of  the  indices  in  each  case.  If  a 
symmetric  product  results  from  some  matrix  multiplication  combination, 
such  as  ZSZT,  then  there  is  one  subroutine  to  do  the  whole  combination,  with 
non-symmetric  Z  and  symmetric  S  as  input,  and  symmetric  result  as  output. 

Of  great  importance  in  estimation  algorithms  is  the  inversion  of  a 
symmetric  matrix  A  (such  as  the  Fisher  information  matrix)  and  the  possible 
simultaneous  solution  for  a  set  of  linear  equations  with  the  given  matrix  as  a 
coefficient  matrix  (for  determining  adjustments  in  a  Newton-Raphson 
iteration).  A  private  subroutine  SYMINV  written  by  Norman  Brenner  in 
1968  was  used  to  perform  this  operation.  SYMINV  takes  as  input  a  symmetric 
matrix  A  stored  in  lower  diagonal  form,  a  right  hand  side  vector  B,  and 
simultaneously  in  place  inverts  the  matrix  A  and  solves  for  the  vector  X  in 
the  equation  AX  =  B.  The  operations  being  done  in  place  means  that  the 
symmetric  matrix  A'1  and  solution  vector  X  replace  A  and  B,  respectively,  at 
the  end  of  the  subroutine  execution. 

Gaussian  elimination  is  used  in  SYMINV.  In  order  to  have  the 
algorithm  work  with  the  lower  diagonal  storage,  interchange  of  rows  and 
columns  is  not  used,  and  the  Gaussian  elimination  pivots  are  chosen  to  be 
diagonal  elements. 

The  only  reason  interchange  of  rows  and  columns  is  used  in  Gaussian 
elimination  is  to  allow  off-diagonal  pivots.  At  a  given  stage  of  the  algorithm 
execution,  the  next  pivot  should  be  the  largest  element  remaining  in  the 
matrix.  However,  for  the  covariance  type  matrices  dealt  with  in  estimation 
theory,  the  largest  elements  are  likely  to  be  on  the  diagonal,  so  it  is  no 
restriction  not  to  allow  row  and  column  interchange  and  only  choose  the 
pivots  on  the  diagonal. 

Since  the  determinant  of  the  matrix  A  is  also  needed  in  some 
applications,  Brenner's  subroutine  SYMINV  was  modified  to  simultaneously 
calculate  det(A).  This  is  accomplished  at  each  stage  of  the  Gaussian 
elimination  by  accumulating  the  product  of  the  pivots,  when  the  chosen 
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pivot  divides  into  a  given  row  to  be  multiplied  by  a  scale  and  subtracted  from 
other  rows. 

It  was  found  that  underflows  occurred  in  accumulating  the  product  of 
the  pivots,  even  with  the  automatic  scaling  described  in  Section  D.3.  Since 
ln[det(A)]  is  desired  in  maximum  likelihood  estimation  in  Chapter  7,  the 
code  was  changed  to  accumulating  the  natural  logarithms  of  the  pivot 
elements. 


D.3  Automatic  Scaling 

Suppose  we  had  to  solve  the  equations  AX  =  B  and  simultaneously 
invert  the  symmetric  matrix  A.  There  could  be  numerical  problems  if,  for 
instance,  the  units  of  the  Xj  were  badly  out  of  scale  relative  to  each  other  (such 
as  meters  versus  micro-inches,  etc.).  Either  the  units  of  the  physical  variables 
in  a  problem  could  be  appropriately  chosen,  or  they  can  be  chosen  in  any 
arbitrary  convenient  fashion,  and  an  automatic  way  of  scaling  the  equations 
applied,  the  equations  solved  and  the  matrix  inverted,  and  then  the  inverse 
scaling  applied. 

Suppose  we  changed  units  to  obtain  variables  X'j  =  SjXj.  There  results  a 
new  set  of  linear  equations  A'X'  =  B’,  where 


B'i  =  “  Bi 

si 

With  A  being  a  Fisher  information  type  of  matrix,  it  turns  out  that  an 
appropriate  automatic  scaling  choice  is  [3],  [15] 

Sj  =  VAjj 

With  this  scaling,  it  has  been  found  that  very  large  matrices  can  be 
inverted  and  equations  solved  without  underflows  or  overflows  on  the 
computer.  A  scaling  subroutine  is  placed  between  the  SYMINV  symmetric 
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matrix  inversion  and  solution  subroutine  to  automatically  scale  the  equation 
AX  =  B,  call  SYMINV,  and  from  the  solution  vector  X'  calculate  Xj  =  j  X'j. 


D.4  Checking  Partial  Derivatives  by  the  Difference  Method 

In  addition  to  the  complications  of,  e.g.,  Kalman  filter  code,  the 
algorithms  of  this  thesis  required  the  coding  of  many  partial  derivatives.  A 
mechanical  way  of  verifying  the  partial  derivative  code  by  the  difference 
method  was  applied  [3]. 

Namely,  let  040  and  oqi  be  two  values  of  a  parameter.  For  a  given 
quantity  X  it  was  checked  numerically  that 


1 

r  ax 

ax 

1 

1 

r  x  |  x  |  1 

2 

La«i 

ai=ail  ^ai 

«i=«iO  J 

an-«iO 

L  -X  I  —  x  1  j 

ai=ail  «i=«iO 

When  errors  had  been  eliminated  with  this  check,  the  estimation  code 
was  generally  able  to  work  correctly. 
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